[SCM] CTDB repository - branch 2.5 updated - ctdb-2.5.3-88-g5bf756d

Amitay Isaacs amitay at samba.org
Tue Jul 22 01:59:12 MDT 2014

The branch, 2.5 has been updated
       via  5bf756da8bc2d9106b1e4ada42db3d4f5d9e5b81 (commit)
      from  096ffd1e9ba2f869f79c934fe2b68d0bbd097e14 (commit)


- Log -----------------------------------------------------------------
commit 5bf756da8bc2d9106b1e4ada42db3d4f5d9e5b81
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jun 2 19:09:38 2014 +1000

    recoverd: Gently abort recovery when election is underway
    Sometimes the recovery daemon fails to get the recovery lock on one
    node so that node is banned.  This seems to always happen during an
    election.  The recovery is triggered because other nodes are found to
    have recovery mode enabled.  They have recovery mode enabled because
    an election has been forced.
    The recovery daemon's main_loop() only does an initial check for an
    election.  After that, a node can force an election and, in the
    process, set itself to be the current winner.  In this situation,
    verify_recmode() will always return MONITOR_RECOVERY_NEEDED so
    do_recovery() is called.  If the previous recovery master hasn't
    admitted defeat and released the recovery lock, then do_recovery()
    will rightly fail.  However, it would be better if it failed a little
    more gracefully, since this case is not that unusual.
    Instead of trying to take the recovery lock, return early with an
    error if there is an election in progress.  Note that the race is
    still there but it is now much narrower.
    There are probably more subtle ways of avoiding this issue, including
    something like this in main_loop():
    -	if (pnn != rec->recmaster) {
    +	if (pnn != rec->recmaster || rec->election_timeout) {
    However, this check is done earlier so it leaves the race window open
    a little wider.
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Mon Jul 21 06:57:07 CEST 2014 on sn-devel-104
    (Imported from commit 705e4174c988eea5c5b3a834710f9f920369c8ee)


Summary of changes:
 server/ctdb_recoverd.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

Changeset truncated at 500 lines:

diff --git a/server/ctdb_recoverd.c b/server/ctdb_recoverd.c
index ab73e88..50709a6 100644
--- a/server/ctdb_recoverd.c
+++ b/server/ctdb_recoverd.c
@@ -1799,6 +1799,12 @@ static int do_recovery(struct ctdb_recoverd *rec,
 	/* if recovery fails, force it again */
 	rec->need_recovery = true;
+	if (rec->election_timeout) {
+		/* an election is in progress */
+		DEBUG(DEBUG_ERR, ("do_recovery called while election in progress - try again later\n"));
+		return -1;
+	}
 	ban_misbehaving_nodes(rec, &self_ban);
 	if (self_ban) {
 		DEBUG(DEBUG_NOTICE, ("This node was banned, aborting recovery\n"));

CTDB repository

More information about the samba-cvs mailing list