[PATCHES] More logging for dbwrap_ctdb
Christof Schmitt
cs at samba.org
Tue Jan 14 15:45:13 MST 2014
On Tue, Jan 14, 2014 at 06:56:44PM +0100, Volker Lendecke wrote:
> On Mon, Jan 13, 2014 at 04:36:20PM -0700, Christof Schmitt wrote:
> > > We could lookup the parameters in db_open_ctdb and store the settings in
> > > db_ctdb_ctx. Would that be enough?
>
> Yes, that would definitely do it. A few lp_parm calls per db
> connect is certainly fine.
Ok, i added that as a new patch in the attached patch series. I also
added another patch to always print milliseconds for consistency.
>
> > Or would introducing formal parameter like we already have with "ctdb
> > locktime warn threshold" be the better approach?
>
> Not sure. These parameters are really, really special use
> only. I'm even uncertain about the locktime warn threshold
> thing. Should this show up in official docs? I don't know.
> Every parameter takes space in the services array.
I am not sure either, just wanted to raise the question.
Christof
-------------- next part --------------
>From 2cd437f9c4114bedd08db9bce3a41635f56aa916 Mon Sep 17 00:00:00 2001
From: Volker Lendecke <vl at samba.org>
Date: Fri, 1 Feb 2013 12:49:52 +0100
Subject: [PATCH 1/6] ctdb_conn: Log long fetch_lock calls
With this patch, the number of fetch_lock attempts before dbwrap_ctdb
logs that it took x attempts to get a record is configurable with
net conf setparm global ctdb:migrate_attempts 10
This patch also adds
net conf setparm global ctdb:migrate_duration 5000
to trigger the same log message if it took longer than x milliseconds
to retrieve a record.
Reviewed-by: Christof Schmitt <cs at samba.org>
---
source3/lib/dbwrap/dbwrap_ctdb.c | 26 ++++++++++++++++++++++----
1 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/source3/lib/dbwrap/dbwrap_ctdb.c b/source3/lib/dbwrap/dbwrap_ctdb.c
index 5a473f9..fa2246b 100644
--- a/source3/lib/dbwrap/dbwrap_ctdb.c
+++ b/source3/lib/dbwrap/dbwrap_ctdb.c
@@ -1017,7 +1017,9 @@ static struct db_record *fetch_locked_internal(struct db_ctdb_ctx *ctx,
struct db_ctdb_rec *crec;
NTSTATUS status;
TDB_DATA ctdb_data;
- int migrate_attempts = 0;
+ int migrate_attempts;
+ struct timeval migrate_start;
+ int duration_msecs;
int lockret;
if (!(result = talloc(mem_ctx, struct db_record))) {
@@ -1044,6 +1046,9 @@ static struct db_record *fetch_locked_internal(struct db_ctdb_ctx *ctx,
return NULL;
}
+ migrate_attempts = 0;
+ GetTimeOfDay(&migrate_start);
+
/*
* Do a blocking lock on the record
*/
@@ -1110,13 +1115,26 @@ again:
goto again;
}
- if (migrate_attempts > 10) {
+ {
+ double duration;
+ duration = timeval_elapsed(&migrate_start);
+
+ /*
+ * Convert the duration to milliseconds to avoid a
+ * floating-point division of
+ * lp_parm_int("migrate_duration") by 1000.
+ */
+ duration_msecs = duration * 1000;
+ }
+
+ if ((migrate_attempts > lp_parm_int(-1, "ctdb", "migrate_attempts", 10)) ||
+ (duration_msecs > lp_parm_int(-1, "ctdb", "migrate_duration", 5000))) {
DEBUG(0, ("db_ctdb_fetch_locked for %s key %s needed %d "
- "attempts\n", tdb_name(ctx->wtdb->tdb),
+ "attempts, %d milliseconds\n", tdb_name(ctx->wtdb->tdb),
hex_encode_talloc(talloc_tos(),
(unsigned char *)key.dptr,
key.dsize),
- migrate_attempts));
+ migrate_attempts, duration_msecs));
}
GetTimeOfDay(&crec->lock_time);
--
1.7.1
>From f807ea4a41f824e19564609bda002b144e357fee Mon Sep 17 00:00:00 2001
From: Volker Lendecke <vl at samba.org>
Date: Mon, 6 May 2013 10:56:12 +0200
Subject: [PATCH 2/6] dbwrap_ctdb: Instrument chainunlock timing
We need an indication whether we run into the fcntl thundering
herd. fcntl unlock should be blindingly fast in the normal case. If it
takes longer than 5 milliseconds, warn. The timeout can be adapted by
setting
ctdb:unlock_warn_threshold = <number-of-milliseconds>
Reviewed-by: Christof Schmitt <cs at samba.org>
---
source3/lib/dbwrap/dbwrap_ctdb.c | 28 ++++++++++++++++++++++++++--
1 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/source3/lib/dbwrap/dbwrap_ctdb.c b/source3/lib/dbwrap/dbwrap_ctdb.c
index fa2246b..b0baa14 100644
--- a/source3/lib/dbwrap/dbwrap_ctdb.c
+++ b/source3/lib/dbwrap/dbwrap_ctdb.c
@@ -944,6 +944,9 @@ static int db_ctdb_record_destr(struct db_record* data)
struct db_ctdb_rec *crec = talloc_get_type_abort(
data->private_data, struct db_ctdb_rec);
int threshold;
+ int ret;
+ struct timeval before;
+ double timediff;
DEBUG(10, (DEBUGLEVEL > 10
? "Unlocking db %u key %s\n"
@@ -952,11 +955,32 @@ static int db_ctdb_record_destr(struct db_record* data)
hex_encode_talloc(data, (unsigned char *)data->key.dptr,
data->key.dsize)));
- tdb_chainunlock(crec->ctdb_ctx->wtdb->tdb, data->key);
+ before = timeval_current();
+
+ ret = tdb_chainunlock(crec->ctdb_ctx->wtdb->tdb, data->key);
+
+ timediff = timeval_elapsed(&before);
+ timediff *= 1000; /* get us milliseconds */
+
+ if (timediff > lp_parm_int(-1, "ctdb", "unlock_warn_threshold", 5)) {
+ char *key;
+ key = hex_encode_talloc(talloc_tos(),
+ (unsigned char *)data->key.dptr,
+ data->key.dsize);
+ DEBUG(0, ("tdb_chainunlock on db %s, key %s took %f milliseconds\n",
+ tdb_name(crec->ctdb_ctx->wtdb->tdb), key,
+ timediff));
+ TALLOC_FREE(key);
+ }
+
+ if (ret != 0) {
+ DEBUG(0, ("tdb_chainunlock failed\n"));
+ return -1;
+ }
threshold = lp_ctdb_locktime_warn_threshold();
if (threshold != 0) {
- double timediff = timeval_elapsed(&crec->lock_time);
+ timediff = timeval_elapsed(&crec->lock_time);
if ((timediff * 1000) > threshold) {
const char *key;
--
1.7.1
>From 30e16f7e7c17355e8db2e6e14db2a1247bd6459e Mon Sep 17 00:00:00 2001
From: Christian Ambach <ambi at samba.org>
Date: Mon, 13 May 2013 17:02:17 +0200
Subject: [PATCH 3/6] s3:dbwrap report time for chainlock and CTDB migrate
Signed-off-by: Christian Ambach <ambi at samba.org>
Reviewed-by: Christof Schmitt <cs at samba.org>
---
source3/lib/dbwrap/dbwrap_ctdb.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/source3/lib/dbwrap/dbwrap_ctdb.c b/source3/lib/dbwrap/dbwrap_ctdb.c
index b0baa14..c1feb9e 100644
--- a/source3/lib/dbwrap/dbwrap_ctdb.c
+++ b/source3/lib/dbwrap/dbwrap_ctdb.c
@@ -1043,6 +1043,10 @@ static struct db_record *fetch_locked_internal(struct db_ctdb_ctx *ctx,
TDB_DATA ctdb_data;
int migrate_attempts;
struct timeval migrate_start;
+ struct timeval chainlock_start;
+ struct timeval ctdb_start;
+ double chainlock_time = 0;
+ double ctdb_time = 0;
int duration_msecs;
int lockret;
@@ -1087,9 +1091,12 @@ again:
TALLOC_FREE(keystr);
}
+ GetTimeOfDay(&chainlock_start);
lockret = tryonly
? tdb_chainlock_nonblock(ctx->wtdb->tdb, key)
: tdb_chainlock(ctx->wtdb->tdb, key);
+ chainlock_time += timeval_elapsed(&chainlock_start);
+
if (lockret != 0) {
DEBUG(3, ("tdb_chainlock failed\n"));
TALLOC_FREE(result);
@@ -1127,8 +1134,11 @@ again:
ctdb_data.dptr ?
((struct ctdb_ltdb_header *)ctdb_data.dptr)->flags : 0));
+ GetTimeOfDay(&ctdb_start);
status = ctdbd_migrate(messaging_ctdbd_connection(), ctx->db_id,
key);
+ ctdb_time += timeval_elapsed(&ctdb_start);
+
if (!NT_STATUS_IS_OK(status)) {
DEBUG(5, ("ctdb_migrate failed: %s\n",
nt_errstr(status)));
@@ -1154,11 +1164,14 @@ again:
if ((migrate_attempts > lp_parm_int(-1, "ctdb", "migrate_attempts", 10)) ||
(duration_msecs > lp_parm_int(-1, "ctdb", "migrate_duration", 5000))) {
DEBUG(0, ("db_ctdb_fetch_locked for %s key %s needed %d "
- "attempts, %d milliseconds\n", tdb_name(ctx->wtdb->tdb),
+ "attempts, %d milliseconds, chainlock: %d ms, "
+ "CTDB %d ms\n", tdb_name(ctx->wtdb->tdb),
hex_encode_talloc(talloc_tos(),
(unsigned char *)key.dptr,
key.dsize),
- migrate_attempts, duration_msecs));
+ migrate_attempts, duration_msecs,
+ (int) chainlock_time * 1000,
+ (int) ctdb_time * 1000));
}
GetTimeOfDay(&crec->lock_time);
--
1.7.1
>From 924e52e2e8c411ee91562ac7d2993357c22737ab Mon Sep 17 00:00:00 2001
From: Christian Ambach <ambi at samba.org>
Date: Mon, 13 May 2013 18:20:43 +0200
Subject: [PATCH 4/6] s3:dbwrap include the hashchain in the logs
Signed-off-by: Christian Ambach <ambi at samba.org>
Reviewed-by: Christof Schmitt <cs at samba.org>
---
source3/lib/dbwrap/dbwrap_ctdb.c | 15 ++++++++++++---
1 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/source3/lib/dbwrap/dbwrap_ctdb.c b/source3/lib/dbwrap/dbwrap_ctdb.c
index c1feb9e..8f05157 100644
--- a/source3/lib/dbwrap/dbwrap_ctdb.c
+++ b/source3/lib/dbwrap/dbwrap_ctdb.c
@@ -1163,12 +1163,21 @@ again:
if ((migrate_attempts > lp_parm_int(-1, "ctdb", "migrate_attempts", 10)) ||
(duration_msecs > lp_parm_int(-1, "ctdb", "migrate_duration", 5000))) {
- DEBUG(0, ("db_ctdb_fetch_locked for %s key %s needed %d "
- "attempts, %d milliseconds, chainlock: %d ms, "
- "CTDB %d ms\n", tdb_name(ctx->wtdb->tdb),
+ int chain = 0;
+
+ if (tdb_get_flags(ctx->wtdb->tdb) & TDB_INCOMPATIBLE_HASH) {
+ chain = tdb_jenkins_hash(&key) %
+ tdb_hash_size(ctx->wtdb->tdb);
+ }
+
+ DEBUG(0, ("db_ctdb_fetch_locked for %s key %s, chain %d "
+ "needed %d attempts, %d milliseconds, "
+ "chainlock: %d ms, CTDB %d ms\n",
+ tdb_name(ctx->wtdb->tdb),
hex_encode_talloc(talloc_tos(),
(unsigned char *)key.dptr,
key.dsize),
+ chain,
migrate_attempts, duration_msecs,
(int) chainlock_time * 1000,
(int) ctdb_time * 1000));
--
1.7.1
>From 2c928bea972f38d7adbd4749aa8a5476448bd947 Mon Sep 17 00:00:00 2001
From: Christof Schmitt <cs at samba.org>
Date: Tue, 14 Jan 2014 14:17:32 -0700
Subject: [PATCH 5/6] s3:dbwrap: Store warning thresholds in db_ctdb_ctx
Avoid the parameter lookup for the warning thresholds in the hot code
path by reading them in db_open_ctdb and storing them in the
db_ctdb_ctx.
Signed-off-by: Christof Schmitt <cs at samba.org>
---
source3/lib/dbwrap/dbwrap_ctdb.c | 22 ++++++++++++++++++----
1 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/source3/lib/dbwrap/dbwrap_ctdb.c b/source3/lib/dbwrap/dbwrap_ctdb.c
index 8f05157..03d499e 100644
--- a/source3/lib/dbwrap/dbwrap_ctdb.c
+++ b/source3/lib/dbwrap/dbwrap_ctdb.c
@@ -73,6 +73,12 @@ struct db_ctdb_ctx {
uint32_t db_id;
struct db_ctdb_transaction_handle *transaction;
struct g_lock_ctx *lock_ctx;
+
+ /* thresholds for warning messages */
+ int warn_unlock_msecs;
+ int warn_migrate_msecs;
+ int warn_migrate_attempts;
+ int warn_locktime_msecs;
};
struct db_ctdb_rec {
@@ -962,7 +968,7 @@ static int db_ctdb_record_destr(struct db_record* data)
timediff = timeval_elapsed(&before);
timediff *= 1000; /* get us milliseconds */
- if (timediff > lp_parm_int(-1, "ctdb", "unlock_warn_threshold", 5)) {
+ if (timediff > crec->ctdb_ctx->warn_unlock_msecs) {
char *key;
key = hex_encode_talloc(talloc_tos(),
(unsigned char *)data->key.dptr,
@@ -978,7 +984,7 @@ static int db_ctdb_record_destr(struct db_record* data)
return -1;
}
- threshold = lp_ctdb_locktime_warn_threshold();
+ threshold = crec->ctdb_ctx->warn_locktime_msecs;
if (threshold != 0) {
timediff = timeval_elapsed(&crec->lock_time);
if ((timediff * 1000) > threshold) {
@@ -1161,8 +1167,8 @@ again:
duration_msecs = duration * 1000;
}
- if ((migrate_attempts > lp_parm_int(-1, "ctdb", "migrate_attempts", 10)) ||
- (duration_msecs > lp_parm_int(-1, "ctdb", "migrate_duration", 5000))) {
+ if ((migrate_attempts > ctx->warn_migrate_attempts) ||
+ (duration_msecs > ctx->warn_migrate_msecs)) {
int chain = 0;
if (tdb_get_flags(ctx->wtdb->tdb) & TDB_INCOMPATIBLE_HASH) {
@@ -1663,6 +1669,14 @@ struct db_context *db_open_ctdb(TALLOC_CTX *mem_ctx,
}
}
+ db_ctdb->warn_unlock_msecs = lp_parm_int(-1, "ctdb",
+ "unlock_warn_threshold", 5);
+ db_ctdb->warn_migrate_attempts = lp_parm_int(-1, "ctdb",
+ "migrate_attempts", 10);
+ db_ctdb->warn_migrate_msecs = lp_parm_int(-1, "ctdb",
+ "migrate_duration", 5000);
+ db_ctdb->warn_locktime_msecs = lp_ctdb_locktime_warn_threshold();
+
result->private_data = (void *)db_ctdb;
result->fetch_locked = db_ctdb_fetch_locked;
result->try_fetch_locked = db_ctdb_try_fetch_locked;
--
1.7.1
>From 81ba8c4a910bfd4d14931146caa0bee6c12caeea Mon Sep 17 00:00:00 2001
From: Christof Schmitt <cs at samba.org>
Date: Tue, 14 Jan 2014 14:23:04 -0700
Subject: [PATCH 6/6] s3:dbwrap: Use milliseconds for "Held tdb lock" message
This is consistent with the parameter using milliseconds and the other
warnings in the same file also using milliseconds.
Signed-off-by: Christof Schmitt <cs at samba.org>
---
source3/lib/dbwrap/dbwrap_ctdb.c | 11 ++++++-----
1 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/source3/lib/dbwrap/dbwrap_ctdb.c b/source3/lib/dbwrap/dbwrap_ctdb.c
index 03d499e..a33d784 100644
--- a/source3/lib/dbwrap/dbwrap_ctdb.c
+++ b/source3/lib/dbwrap/dbwrap_ctdb.c
@@ -986,16 +986,17 @@ static int db_ctdb_record_destr(struct db_record* data)
threshold = crec->ctdb_ctx->warn_locktime_msecs;
if (threshold != 0) {
- timediff = timeval_elapsed(&crec->lock_time);
- if ((timediff * 1000) > threshold) {
+ timediff = timeval_elapsed(&crec->lock_time) * 1000;
+ if (timediff > threshold) {
const char *key;
key = hex_encode_talloc(data,
(unsigned char *)data->key.dptr,
data->key.dsize);
- DEBUG(0, ("Held tdb lock on db %s, key %s %f seconds\n",
- tdb_name(crec->ctdb_ctx->wtdb->tdb), key,
- timediff));
+ DEBUG(0, ("Held tdb lock on db %s, key %s "
+ "%f milliseconds\n",
+ tdb_name(crec->ctdb_ctx->wtdb->tdb),
+ key, timediff));
}
}
--
1.7.1
More information about the samba-technical
mailing list