[SCM] CTDB repository - branch master updated - 6579a6a2a7161214adedf0f67dce62f4a4ad1afe

Andrew Tridgell tridge at samba.org
Thu Nov 20 23:25:12 GMT 2008


The branch, master has been updated
       via  6579a6a2a7161214adedf0f67dce62f4a4ad1afe (commit)
      from  fe6ddf7992ca3e72a26dbac6666e0f6270da611f (commit)

http://gitweb.samba.org/?p=tridge/ctdb.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 6579a6a2a7161214adedf0f67dce62f4a4ad1afe
Author: Andrew Tridgell <tridge at samba.org>
Date:   Fri Nov 21 08:05:59 2008 +1100

    fixed problem with looping ctdb recoveries
    
    After a node failure, GPFS can get into a state where non-blocking
    fcntl() locks can take a long time. This means to the ctdb set_recmode
    test timing out, which leads to a recovery failure, and a new
    recovery. The recovery loop can last a long time.
    
    The fix is to consider a fcntl timeout as a success of this test. The
    test is to see that we can't lock the shared reclock file, so a
    timeout is fine for a success.

-----------------------------------------------------------------------

Summary of changes:
 server/ctdb_recover.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)


Changeset truncated at 500 lines:

diff --git a/server/ctdb_recover.c b/server/ctdb_recover.c
index c8b0ba0..39b73ac 100644
--- a/server/ctdb_recover.c
+++ b/server/ctdb_recover.c
@@ -477,7 +477,14 @@ static void ctdb_set_recmode_timeout(struct event_context *ev, struct timed_even
 	struct ctdb_set_recmode_state *state = talloc_get_type(private_data, 
 					   struct ctdb_set_recmode_state);
 
-	ctdb_request_control_reply(state->ctdb, state->c, NULL, -1, "timeout in ctdb_set_recmode");
+	/* we consider this a success, not a failure, as we failed to
+	   set the recovery lock which is what we wanted.  This can be
+	   caused by the cluster filesystem being very slow to
+	   arbitrate locks immediately after a node failure.	   
+	 */
+	DEBUG(DEBUG_NOTICE,(__location__ " set_recmode timeout - allowing recmode set\n"));
+	state->ctdb->recovery_mode = state->recmode;
+	ctdb_request_control_reply(state->ctdb, state->c, NULL, 0, NULL);
 	talloc_free(state);
 }
 
@@ -643,7 +650,7 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb,
 	talloc_set_destructor(state, set_recmode_destructor);
 
 	state->te = event_add_timed(ctdb->ev, state, timeval_current_ofs(3, 0),
-			ctdb_set_recmode_timeout, state);
+				    ctdb_set_recmode_timeout, state);
 
 	state->fde = event_add_fd(ctdb->ev, state, state->fd[0],
 				EVENT_FD_READ|EVENT_FD_AUTOCLOSE,


-- 
CTDB repository


More information about the samba-cvs mailing list