[SCM] CTDB repository - branch master updated -
6579a6a2a7161214adedf0f67dce62f4a4ad1afe
Andrew Tridgell
tridge at samba.org
Thu Nov 20 23:25:12 GMT 2008
The branch, master has been updated
via 6579a6a2a7161214adedf0f67dce62f4a4ad1afe (commit)
from fe6ddf7992ca3e72a26dbac6666e0f6270da611f (commit)
http://gitweb.samba.org/?p=tridge/ctdb.git;a=shortlog;h=master
- Log -----------------------------------------------------------------
commit 6579a6a2a7161214adedf0f67dce62f4a4ad1afe
Author: Andrew Tridgell <tridge at samba.org>
Date: Fri Nov 21 08:05:59 2008 +1100
fixed problem with looping ctdb recoveries
After a node failure, GPFS can get into a state where non-blocking
fcntl() locks can take a long time. This means to the ctdb set_recmode
test timing out, which leads to a recovery failure, and a new
recovery. The recovery loop can last a long time.
The fix is to consider a fcntl timeout as a success of this test. The
test is to see that we can't lock the shared reclock file, so a
timeout is fine for a success.
-----------------------------------------------------------------------
Summary of changes:
server/ctdb_recover.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)
Changeset truncated at 500 lines:
diff --git a/server/ctdb_recover.c b/server/ctdb_recover.c
index c8b0ba0..39b73ac 100644
--- a/server/ctdb_recover.c
+++ b/server/ctdb_recover.c
@@ -477,7 +477,14 @@ static void ctdb_set_recmode_timeout(struct event_context *ev, struct timed_even
struct ctdb_set_recmode_state *state = talloc_get_type(private_data,
struct ctdb_set_recmode_state);
- ctdb_request_control_reply(state->ctdb, state->c, NULL, -1, "timeout in ctdb_set_recmode");
+ /* we consider this a success, not a failure, as we failed to
+ set the recovery lock which is what we wanted. This can be
+ caused by the cluster filesystem being very slow to
+ arbitrate locks immediately after a node failure.
+ */
+ DEBUG(DEBUG_NOTICE,(__location__ " set_recmode timeout - allowing recmode set\n"));
+ state->ctdb->recovery_mode = state->recmode;
+ ctdb_request_control_reply(state->ctdb, state->c, NULL, 0, NULL);
talloc_free(state);
}
@@ -643,7 +650,7 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb,
talloc_set_destructor(state, set_recmode_destructor);
state->te = event_add_timed(ctdb->ev, state, timeval_current_ofs(3, 0),
- ctdb_set_recmode_timeout, state);
+ ctdb_set_recmode_timeout, state);
state->fde = event_add_fd(ctdb->ev, state, state->fd[0],
EVENT_FD_READ|EVENT_FD_AUTOCLOSE,
--
CTDB repository
More information about the samba-cvs
mailing list