[PATCH] ctdb-locking: Back-off from logging every 10 seconds

Michael Adam obnox at samba.org
Thu Mar 5 01:14:35 MST 2015


Looks good to me.
Doing an private build/test and pushing then.

Michael

On 2015-03-05 at 16:27 +1100, Amitay Isaacs wrote:
> Hi,
> 
> This patch prevents flooding of debug logs by locking code when a lock
> helper is unable to obtain a lock for a long time.  Instead of logging
> every 10 seconds, increase the interval to 100 seconds and 1000 seconds
> when the elapsed time reaches 100 seconds and 1000 seconds respectively.
> 
> Please review and push if ok.
> 
> Amitay.

> From a8d10da180dea7e4c202176fc447f370662bb6f5 Mon Sep 17 00:00:00 2001
> From: Amitay Isaacs <amitay at gmail.com>
> Date: Wed, 4 Mar 2015 15:36:05 +1100
> Subject: [PATCH] ctdb-locking: Back-off from logging every 10 seconds
> 
> If ctdb_lock_helper cannot get a lock within 10 seconds, ctdb daemon
> logs a message and invokes an external debug script.  This is repeated
> every 10 seconds.
> 
> In case of a contention or on a loaded system, there can be multiple
> ctdb_lock_helper processes waiting to get lock on record(s).  For each
> lock request taking longer, ctdb daemon will flood the log every
> 10 seconds.  Instead of logging aggressively every 10 seconds, relax
> logging to every 100s and 1000s if the elapsed time has exceeded 100s
> and 1000s respectively.
> 
> Signed-off-by: Amitay Isaacs <amitay at gmail.com>
> ---
>  ctdb/server/ctdb_lock.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/ctdb/server/ctdb_lock.c b/ctdb/server/ctdb_lock.c
> index 7959d40..c5a2b98 100644
> --- a/ctdb/server/ctdb_lock.c
> +++ b/ctdb/server/ctdb_lock.c
> @@ -486,6 +486,8 @@ static void ctdb_lock_timeout_handler(struct tevent_context *ev,
>  	struct lock_context *lock_ctx;
>  	struct ctdb_context *ctdb;
>  	pid_t pid;
> +	double elapsed_time;
> +	int new_timer;
>  
>  	lock_ctx = talloc_get_type_abort(private_data, struct lock_context);
>  	ctdb = lock_ctx->ctdb;
> @@ -495,16 +497,17 @@ static void ctdb_lock_timeout_handler(struct tevent_context *ev,
>  		lock_ctx->ttimer = NULL;
>  		return;
>  	}
> +
> +	elapsed_time = timeval_elapsed(&lock_ctx->start_time);
>  	if (lock_ctx->ctdb_db) {
>  		DEBUG(DEBUG_WARNING,
>  		      ("Unable to get %s lock on database %s for %.0lf seconds\n",
>  		       (lock_ctx->type == LOCK_RECORD ? "RECORD" : "DB"),
> -		       lock_ctx->ctdb_db->db_name,
> -		       timeval_elapsed(&lock_ctx->start_time)));
> +		       lock_ctx->ctdb_db->db_name, elapsed_time));
>  	} else {
>  		DEBUG(DEBUG_WARNING,
>  		      ("Unable to get ALLDB locks for %.0lf seconds\n",
> -		       timeval_elapsed(&lock_ctx->start_time)));
> +		       elapsed_time));
>  	}
>  
>  	/* Fire a child process to find the blocking process. */
> @@ -529,11 +532,20 @@ static void ctdb_lock_timeout_handler(struct tevent_context *ev,
>  		       " Unable to setup lock debugging - no memory?\n"));
>  	}
>  
> +	/* Back-off logging if lock is not obtained for a long time */
> +	if (elapsed_time < 100.0) {
> +		new_timer = 10;
> +	} else if (elapsed_time < 1000.0) {
> +		new_timer = 100;
> +	} else {
> +		new_timer = 1000;
> +	}
> +
>  	/* reset the timeout timer */
>  	// talloc_free(lock_ctx->ttimer);
>  	lock_ctx->ttimer = tevent_add_timer(ctdb->ev,
>  					    lock_ctx,
> -					    timeval_current_ofs(10, 0),
> +					    timeval_current_ofs(new_timer, 0),
>  					    ctdb_lock_timeout_handler,
>  					    (void *)lock_ctx);
>  }
> -- 
> 1.9.3
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20150305/5bcf06dd/attachment.pgp>


More information about the samba-technical mailing list