[SCM] Samba Shared Repository - branch master updated

Fri Feb 13 01:49:03 MST 2015

The branch, master has been updated
       via  39d2fd3 ctdb-recoverd: Abort when daemon can take recovery lock during recovery
       via  432d677 ctdb-recoverd: Improve error messages on recovery lock coherence fail
       via  48c9140 ctdb-recoverd: Don't release and re-take the recovery lock
       via  1d6ed91 ctdb-recoverd: Simplify ctdb_recovery_lock()
       via  be19a17 ctdb-recoverd: Remove check_recovery_lock()
       via  668ed53 ctdb-recoverd: Improve logging when recovery lock file is changed
       via  db32a2b ctdb-recoverd: New function ctdb_recovery_unlock()
       via  72701be ctdb-recoverd: New function ctdb_recovery_have_lock()
       via  4d3b52f ctdb-daemon: Log a warning when setting obsolete tunables
       via  d110fe2 ctdb-daemon: Mark tunable VerifyRecoveryLock as obsolete
       via  a01744c ctdb-doc: Improve documentation of the recovery lock
      from  5f08d8b snprintf: Try to support %j

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 39d2fd330a60ea590d76213f8cb406a42fa8d680
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jan 27 12:55:42 2015 +1100

    ctdb-recoverd: Abort when daemon can take recovery lock during recovery
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Fri Feb 13 09:48:15 CET 2015 on sn-devel-104

commit 432d6774891eba30a959cd2d8ee8469d189c7872
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Dec 17 20:33:19 2014 +1100

    ctdb-recoverd: Improve error messages on recovery lock coherence fail
    
    When the daemon is able to take the recovery lock during recovery we
    might as well guess that the cluster filesystem has a lock coherence
    problem and print a more useful message.  This will be more helpful to
    those trying out cluster filesystems that don't have lock coherence or
    that are difficult to setup.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 48c91407abd5e34463d3a10cb6fce47ec4a0d5f6
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Dec 9 13:51:27 2014 +1100

    ctdb-recoverd: Don't release and re-take the recovery lock
    
    Just continue to hold it, otherwise a broken node might win an
    election and grab the lock.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 1d6ed91f5518d462ba368bca03be923428710157
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Dec 9 14:50:38 2014 +1100

    ctdb-recoverd: Simplify ctdb_recovery_lock()
    
    Have it just silently take or fail to take the lock, except on an
    unexpected failure (where it should log an error).
    
    This means that when it is called we need to keep the old behaviour
    and explicitly release the lock.  In do_recovery() the lock is
    released and a message is printed before attempting to take the lock.
    In the daemon sanity check the lock must be released in the error path
    if it is actually taken.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit be19a17faf6da97365c425c5b423e9b74f9c9e0c
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Dec 9 14:45:08 2014 +1100

    ctdb-recoverd: Remove check_recovery_lock()
    
    This has not done anything useful since commit
    b9d8bb23af8abefb2d967e9b4e9d6e60c4a3b520.  Instead, just check that
    the lock is held.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 668ed5366237b61f0ff618f32555ce29cca5e6f3
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Dec 9 14:09:40 2014 +1100

    ctdb-recoverd: Improve logging when recovery lock file is changed
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit db32a2bce54b9618fe247b33d6de81bd5f7a3b62
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Dec 9 14:07:20 2014 +1100

    ctdb-recoverd: New function ctdb_recovery_unlock()
    
    Unlock the recovery lock file.  This way knowledge of the file
    descriptor isn't sprinkled throughout the code.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 72701be663ddb265320a022a22130a3437bbf6bc
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Dec 9 13:50:22 2014 +1100

    ctdb-recoverd: New function ctdb_recovery_have_lock()
    
    True if this recovery daemon holds the lock.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 4d3b52f1cec46f66f8d0827bc8f458cd8c86b5a5
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Dec 9 13:49:06 2014 +1100

    ctdb-daemon: Log a warning when setting obsolete tunables
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit d110fe231849d76ecb83378c934627dc64b74c72
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Dec 9 13:47:42 2014 +1100

    ctdb-daemon: Mark tunable VerifyRecoveryLock as obsolete
    
    It is pointless having a recovery lock but not sanity checking that it
    is working.  Also, the logic that uses this tunable is confusing.  In
    some places the recovery lock is released unnecessarily because the
    tunable isn't set.
    
    Simplify the logic by assuming that if a recovery lock is specified
    then it should be verified.
    
    Update documentation that references this tunable.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit a01744c08ff5b8aca4af99842acfc78a87af9297
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Feb 3 14:27:11 2015 +1100

    ctdb-doc: Improve documentation of the recovery lock
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 ctdb/doc/ctdb-tunables.7.xml |   9 --
 ctdb/doc/ctdb.1.xml          |  34 ++++--
 ctdb/doc/ctdb.7.xml          |  54 +++++++++
 ctdb/doc/ctdbd.1.xml         |  16 +--
 ctdb/doc/ctdbd.conf.5.xml    |   6 +
 ctdb/include/ctdb_private.h  |   4 +-
 ctdb/server/ctdb_recover.c   |  90 +++++++-------
 ctdb/server/ctdb_recoverd.c  | 272 ++++++++-----------------------------------
 ctdb/server/ctdb_server.c    |   1 -
 ctdb/server/ctdb_tunables.c  |  19 ++-
 10 files changed, 201 insertions(+), 304 deletions(-)


Changeset truncated at 500 lines:

diff --git a/ctdb/doc/ctdb-tunables.7.xml b/ctdb/doc/ctdb-tunables.7.xml
index 456e856..b029fdb 100644
--- a/ctdb/doc/ctdb-tunables.7.xml
+++ b/ctdb/doc/ctdb-tunables.7.xml
@@ -448,15 +448,6 @@
     </refsect2>
 
     <refsect2>
-      <title>VerifyRecoveryLock</title>
-      <para>Default: 1</para>
-      <para>
-	Should we take a fcntl() lock on the reclock file to verify that we are the
-	sole recovery master node on the cluster or not.
-      </para>
-    </refsect2>
-
-    <refsect2>
       <title>VacuumInterval</title>
       <para>Default: 10</para>
       <para>
diff --git a/ctdb/doc/ctdb.1.xml b/ctdb/doc/ctdb.1.xml
index 07df43b..1a1ae40 100644
--- a/ctdb/doc/ctdb.1.xml
+++ b/ctdb/doc/ctdb.1.xml
@@ -682,7 +682,6 @@ RecdFailCount           = 10
 LogLatencyMs            = 0
 RecLockLatencyMs        = 1000
 RecoveryDropAllIPs      = 120
-VerifyRecoveryLock      = 1
 VacuumInterval          = 10
 VacuumMaxRunTime        = 30
 RepackLimit             = 10000
@@ -883,30 +882,51 @@ DB Statistics: locking.tdb
     <refsect2>
       <title>getreclock</title>
       <para>
-	This command is used to show the filename of the reclock file that is used.
+	Show the name of the recovery lock file, if any.
       </para>
 
       <para>
 	Example output:
       </para>
       <screen>
-	Reclock file:/gpfs/.ctdb/shared
+	Reclock file:/clusterfs/.ctdb/recovery.lock
       </screen>
 
     </refsect2>
 
     <refsect2>
-      <title>setreclock [filename]</title>
+      <title>
+	setreclock <optional><parameter>FILE</parameter></optional>
+      </title>
+
       <para>
-	This command is used to modify, or clear, the file that is used as the reclock file at runtime. When this command is used, the reclock file checks are disabled. To re-enable the checks the administrator needs to activate the "VerifyRecoveryLock" tunable using "ctdb setvar".
+	FILE specifies the name of the recovery lock file.  If the
+	recovery lock file is changed at run-time then this will cause
+	a recovery, which in turn causes the recovery lock to be
+	retaken.
       </para>
 
       <para>
-	If run with no parameter this will remove the reclock file completely. If run with a parameter the parameter specifies the new filename to use for the recovery lock.
+	If no FILE is specified then a recovery lock file will no
+	longer be used.
       </para>
 
       <para>
-	This command only affects the runtime settings of a ctdb node and will be lost when ctdb is restarted. For persistent changes to the reclock file setting you must edit /etc/sysconfig/ctdb.
+	This command only affects the run-time setting of a single
+	CTDB node.  This setting <emphasis>must</emphasis> be changed
+	on all nodes simultaneously by specifying <option>-n
+	all</option> (or similar).  For information about configuring
+	the recovery lock file please see the
+	<citetitle>CTDB_RECOVERY_LOCK</citetitle> entry in
+	<citerefentry><refentrytitle>ctdbd.conf</refentrytitle>
+	<manvolnum>5</manvolnum></citerefentry> and the
+	<citetitle>--reclock</citetitle> entry in
+	<citerefentry><refentrytitle>ctdbd</refentrytitle>
+	<manvolnum>1</manvolnum></citerefentry>.  For information
+	about the recovery lock please see the <citetitle>RECOVERY
+	LOCK</citetitle> section in
+	<citerefentry><refentrytitle>ctdb</refentrytitle>
+	<manvolnum>7</manvolnum></citerefentry>.
       </para>
     </refsect2>
 
diff --git a/ctdb/doc/ctdb.7.xml b/ctdb/doc/ctdb.7.xml
index b54fa42..ad17df7 100644
--- a/ctdb/doc/ctdb.7.xml
+++ b/ctdb/doc/ctdb.7.xml
@@ -76,6 +76,60 @@
 </refsect1>
 
   <refsect1>
+    <title>Recovery Lock</title>
+
+    <para>
+      CTDB uses a <emphasis>recovery lock</emphasis> to avoid a
+      <emphasis>split brain</emphasis>, where a cluster becomes
+      partitioned and each partition attempts to operate
+      independently.  Issues that can result from a split brain
+      include file data corruption, because file locking metadata may
+      not be tracked correctly.
+    </para>
+
+    <para>
+      CTDB uses a <emphasis>cluster leader and follower</emphasis>
+      model of cluster management.  All nodes in a cluster elect one
+      node to be the leader.  The leader node coordinates privileged
+      operations such as database recovery and IP address failover.
+      CTDB refers to the leader node as the <emphasis>recovery
+      master</emphasis>.  This node takes and holds the recovery lock
+      to assert its privileged role in the cluster.
+    </para>
+
+    <para>
+      The recovery lock is implemented using a file residing in shared
+      storage (usually) on a cluster filesystem.  To support a
+      recovery lock the cluster filesystem must support lock
+      coherence.  See
+      <citerefentry><refentrytitle>ping_pong</refentrytitle>
+      <manvolnum>1</manvolnum></citerefentry> for more details.
+    </para>
+
+    <para>
+      If a cluster becomes partitioned (for example, due to a
+      communication failure) and a different recovery master is
+      elected by the nodes in each partition, then only one of these
+      recovery masters will be able to take the recovery lock.  The
+      recovery master in the "losing" partition will not be able to
+      take the recovery lock and will be excluded from the cluster.
+      The nodes in the "losing" partition will elect each node in turn
+      as their recovery master so eventually all the nodes in that
+      partition will be excluded.
+    </para>
+
+    <para>
+      CTDB does sanity checks to ensure that the recovery lock is held
+      as expected.
+    </para>
+
+    <para>
+      CTDB can run without a recovery lock but this is not recommended
+      as there will be no protection from split brains.
+    </para>
+  </refsect1>
+
+  <refsect1>
     <title>Private vs Public addresses</title>
 
     <para>
diff --git a/ctdb/doc/ctdbd.1.xml b/ctdb/doc/ctdbd.1.xml
index a499318..fc17acf 100644
--- a/ctdb/doc/ctdbd.1.xml
+++ b/ctdb/doc/ctdbd.1.xml
@@ -362,18 +362,18 @@
       </varlistentry>
 
       <varlistentry>
-	<term>--reclock=<parameter>FILENAME</parameter></term>
+	<term>--reclock=<parameter>FILE</parameter></term>
 	<listitem>
 	  <para>
-	    FILENAME is the name of the recovery lock file stored in
-	    <emphasis>shared storage</emphasis> that ctdbd uses to
-	    prevent split brains from occuring.
+	    FILE is the name of the recovery lock file, stored in
+	    <emphasis>shared storage</emphasis>, that CTDB uses to
+	    prevent split brains.
 	  </para>
 	  <para>
-	    It is possible to run CTDB without a recovery lock file, but
-	    then there will be no protection against split brain if the
-	    cluster/network becomes partitioned. Using CTDB without a
-	    reclock file is strongly discouraged.
+	    For information about the recovery lock please see the
+	    <citetitle>RECOVERY LOCK</citetitle> section in
+	    <citerefentry><refentrytitle>ctdb</refentrytitle>
+	    <manvolnum>7</manvolnum></citerefentry>.
 	  </para>
 	</listitem>
       </varlistentry>
diff --git a/ctdb/doc/ctdbd.conf.5.xml b/ctdb/doc/ctdbd.conf.5.xml
index cf71a35..52c1298 100644
--- a/ctdb/doc/ctdbd.conf.5.xml
+++ b/ctdb/doc/ctdbd.conf.5.xml
@@ -373,6 +373,12 @@
 	    should be change to a useful value.  Corresponds to
 	    <option>--reclock</option>.
 	  </para>
+	  <para>
+	    For information about the recovery lock please see the
+	    <citetitle>RECOVERY LOCK</citetitle> section in
+	    <citerefentry><refentrytitle>ctdb</refentrytitle>
+	    <manvolnum>7</manvolnum></citerefentry>.
+	  </para>
 	</listitem>
       </varlistentry>
 
diff --git a/ctdb/include/ctdb_private.h b/ctdb/include/ctdb_private.h
index 3d6f487..7005fd8 100644
--- a/ctdb/include/ctdb_private.h
+++ b/ctdb/include/ctdb_private.h
@@ -1259,7 +1259,9 @@ void ctdb_release_all_ips(struct ctdb_context *ctdb);
 void set_nonblocking(int fd);
 void set_close_on_exec(int fd);
 
-bool ctdb_recovery_lock(struct ctdb_context *ctdb, bool keep);
+bool ctdb_recovery_have_lock(struct ctdb_context *ctdb);
+bool ctdb_recovery_lock(struct ctdb_context *ctdb);
+void ctdb_recovery_unlock(struct ctdb_context *ctdb);
 
 int ctdb_set_recovery_lock_file(struct ctdb_context *ctdb, const char *file);
 
diff --git a/ctdb/server/ctdb_recover.c b/ctdb/server/ctdb_recover.c
index 10d088b..db88f06 100644
--- a/ctdb/server/ctdb_recover.c
+++ b/ctdb/server/ctdb_recover.c
@@ -528,18 +528,21 @@ static void set_recmode_handler(struct event_context *ev, struct fd_event *fde,
 	state->te = NULL;
 
 
-	/* read the childs status when trying to lock the reclock file.
-	   child wrote 0 if everything is fine and 1 if it did manage
-	   to lock the file, which would be a problem since that means
-	   we got a request to exit from recovery but we could still lock
-	   the file   which at this time SHOULD be locked by the recovery
-	   daemon on the recmaster
-	*/		
+	/* If, as expected, the child was unable to take the recovery
+	 * lock then it will have written 0 into the pipe, so
+	 * continue.  However, any other value (e.g. 1) indicates that
+	 * it was able to take the recovery lock when it should have
+	 * been held by the recovery daemon on the recovery master.
+	*/
 	ret = sys_read(state->fd[0], &c, 1);
 	if (ret != 1 || c != 0) {
-		ctdb_request_control_reply(state->ctdb, state->c, NULL, -1, "managed to lock reclock file from inside daemon");
+		const char *msg = \
+			"Took recovery lock from daemon - probably a cluster filesystem lock coherence problem";
+		ctdb_request_control_reply(
+			state->ctdb, state->c, NULL, -1,
+			msg);
 		talloc_free(state);
-		return;
+		ctdb_die(state->ctdb, msg);
 	}
 
 	state->ctdb->recovery_mode = state->recmode;
@@ -640,8 +643,8 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb,
 		ctdb_process_deferred_attach(ctdb);
 	}
 
-	if (ctdb->tunable.verify_recovery_lock == 0) {
-		/* dont need to verify the reclock file */
+	if (ctdb->recovery_lock_file == NULL) {
+		/* Not using recovery lock file */
 		ctdb->recovery_mode = recmode;
 		return 0;
 	}
@@ -672,11 +675,13 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb,
 
 		ctdb_set_process_name("ctdb_recmode");
 		debug_extra = talloc_asprintf(NULL, "set_recmode:");
-		/* we should not be able to get the lock on the reclock file, 
-		  as it should  be held by the recovery master 
-		*/
-		if (ctdb_recovery_lock(ctdb, false)) {
-			DEBUG(DEBUG_CRIT,("ERROR: recovery lock file %s not locked when recovering!\n", ctdb->recovery_lock_file));
+		/* Daemon should not be able to get the recover lock,
+		 * as it should be held by the recovery master */
+		if (ctdb_recovery_lock(ctdb)) {
+			DEBUG(DEBUG_ERR,
+			      ("ERROR: Daemon able to take recovery lock on \"%s\" during recovery\n",
+			       ctdb->recovery_lock_file));
+			ctdb_recovery_unlock(ctdb);
 			cc = 1;
 		}
 
@@ -721,26 +726,25 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb,
 }
 
 
+bool ctdb_recovery_have_lock(struct ctdb_context *ctdb)
+{
+	return ctdb->recovery_lock_fd != -1;
+}
+
 /*
   try and get the recovery lock in shared storage - should only work
   on the recovery master recovery daemon. Anywhere else is a bug
  */
-bool ctdb_recovery_lock(struct ctdb_context *ctdb, bool keep)
+bool ctdb_recovery_lock(struct ctdb_context *ctdb)
 {
 	struct flock lock;
 
-	if (keep) {
-		DEBUG(DEBUG_ERR, ("Take the recovery lock\n"));
-	}
-	if (ctdb->recovery_lock_fd != -1) {
-		close(ctdb->recovery_lock_fd);
-		ctdb->recovery_lock_fd = -1;
-	}
-
-	ctdb->recovery_lock_fd = open(ctdb->recovery_lock_file, O_RDWR|O_CREAT, 0600);
+	ctdb->recovery_lock_fd = open(ctdb->recovery_lock_file,
+				      O_RDWR|O_CREAT, 0600);
 	if (ctdb->recovery_lock_fd == -1) {
-		DEBUG(DEBUG_ERR,("ctdb_recovery_lock: Unable to open %s - (%s)\n", 
-			 ctdb->recovery_lock_file, strerror(errno)));
+		DEBUG(DEBUG_ERR,
+		      ("ctdb_recovery_lock: Unable to open %s - (%s)\n",
+		       ctdb->recovery_lock_file, strerror(errno)));
 		return false;
 	}
 
@@ -756,27 +760,29 @@ bool ctdb_recovery_lock(struct ctdb_context *ctdb, bool keep)
 		int saved_errno = errno;
 		close(ctdb->recovery_lock_fd);
 		ctdb->recovery_lock_fd = -1;
-		if (keep) {
-			DEBUG(DEBUG_CRIT,("ctdb_recovery_lock: Failed to get "
-					  "recovery lock on '%s' - (%s)\n",
-					  ctdb->recovery_lock_file,
-					  strerror(saved_errno)));
+		/* Fail silently on these errors, since they indicate
+		 * lock contention, but log an error for any other
+		 * failure. */
+		if (saved_errno != EACCES &&
+		    saved_errno != EAGAIN) {
+			DEBUG(DEBUG_ERR,("ctdb_recovery_lock: Failed to get "
+					 "recovery lock on '%s' - (%s)\n",
+					 ctdb->recovery_lock_file,
+					 strerror(saved_errno)));
 		}
 		return false;
 	}
 
-	if (!keep) {
+	return true;
+}
+
+void ctdb_recovery_unlock(struct ctdb_context *ctdb)
+{
+	if (ctdb->recovery_lock_fd != -1) {
+		DEBUG(DEBUG_NOTICE, ("Releasing recovery lock\n"));
 		close(ctdb->recovery_lock_fd);
 		ctdb->recovery_lock_fd = -1;
 	}
-
-	if (keep) {
-		DEBUG(DEBUG_NOTICE, ("Recovery lock taken successfully\n"));
-	}
-
-	DEBUG(DEBUG_NOTICE,("ctdb_recovery_lock: Got recovery lock on '%s'\n", ctdb->recovery_lock_file));
-
-	return true;
 }
 
 /*
diff --git a/ctdb/server/ctdb_recoverd.c b/ctdb/server/ctdb_recoverd.c
index f86f57e..99018be 100644
--- a/ctdb/server/ctdb_recoverd.c
+++ b/ctdb/server/ctdb_recoverd.c
@@ -1808,28 +1808,36 @@ static int do_recovery(struct ctdb_recoverd *rec,
 		return -1;
 	}
 
-        if (ctdb->tunable.verify_recovery_lock != 0) {
-		DEBUG(DEBUG_ERR,("Taking out recovery lock from recovery daemon\n"));
-		start_time = timeval_current();
-		if (!ctdb_recovery_lock(ctdb, true)) {
-			if (ctdb->runstate == CTDB_RUNSTATE_FIRST_RECOVERY) {
-				/* If ctdb is trying first recovery, it's
-				 * possible that current node does not know yet
-				 * who the recmaster is.
-				 */
-				DEBUG(DEBUG_ERR, ("Unable to get recovery lock"
-						" - retrying recovery\n"));
+        if (ctdb->recovery_lock_file != NULL) {
+		if (ctdb_recovery_have_lock(ctdb)) {
+			DEBUG(DEBUG_NOTICE, ("Already holding recovery lock\n"));
+		} else {
+			start_time = timeval_current();
+			DEBUG(DEBUG_NOTICE, ("Attempting to take recovery lock (%s)\n",
+					     ctdb->recovery_lock_file));
+			if (!ctdb_recovery_lock(ctdb)) {
+				if (ctdb->runstate == CTDB_RUNSTATE_FIRST_RECOVERY) {
+					/* If ctdb is trying first recovery, it's
+					 * possible that current node does not know
+					 * yet who the recmaster is.
+					 */
+					DEBUG(DEBUG_ERR, ("Unable to get recovery lock"
+							  " - retrying recovery\n"));
+					return -1;
+				}
+
+				DEBUG(DEBUG_ERR,("Unable to get recovery lock - aborting recovery "
+						 "and ban ourself for %u seconds\n",
+						 ctdb->tunable.recovery_ban_period));
+				ctdb_ban_node(rec, pnn, ctdb->tunable.recovery_ban_period);
 				return -1;
 			}
-
-			DEBUG(DEBUG_ERR,("Unable to get recovery lock - aborting recovery "
-					 "and ban ourself for %u seconds\n",
-					 ctdb->tunable.recovery_ban_period));
-			ctdb_ban_node(rec, pnn, ctdb->tunable.recovery_ban_period);
-			return -1;
+			ctdb_ctrl_report_recd_lock_latency(ctdb,
+							   CONTROL_TIMEOUT(),
+							   timeval_elapsed(&start_time));
+			DEBUG(DEBUG_NOTICE,
+			      ("Recovery lock taken successfully by recovery daemon\n"));
 		}
-		ctdb_ctrl_report_recd_lock_latency(ctdb, CONTROL_TIMEOUT(), timeval_elapsed(&start_time));
-		DEBUG(DEBUG_NOTICE,("Recovery lock taken successfully by recovery daemon\n"));
 	}
 
 	DEBUG(DEBUG_NOTICE, (__location__ " Recovery initiated due to problem with node %u\n", rec->last_culprit_node));
@@ -2679,18 +2687,16 @@ static void election_handler(struct ctdb_context *ctdb, uint64_t srvid,
 		/*unban_all_nodes(ctdb);*/
 		return;
 	}
-	
+
 	/* we didn't win */
 	talloc_free(rec->send_election_te);
 	rec->send_election_te = NULL;
 
-        if (ctdb->tunable.verify_recovery_lock != 0) {
-		/* release the recmaster lock */
+        if (ctdb->recovery_lock_file != NULL) {
+		/* Release the recovery lock file */
 		if (em->pnn != ctdb->pnn &&
-		    ctdb->recovery_lock_fd != -1) {
-			DEBUG(DEBUG_NOTICE, ("Release the recovery lock\n"));
-			close(ctdb->recovery_lock_fd);
-			ctdb->recovery_lock_fd = -1;
+		    ctdb_recovery_have_lock(ctdb)) {
+			ctdb_recovery_unlock(ctdb);
 			unban_all_nodes(ctdb);
 		}
 	}
@@ -3287,181 +3293,6 @@ static int get_remote_nodemaps(struct ctdb_context *ctdb, TALLOC_CTX *mem_ctx,
 	return 0;
 }
 
-enum reclock_child_status { RECLOCK_CHECKING, RECLOCK_OK, RECLOCK_FAILED, RECLOCK_TIMEOUT};
-struct ctdb_check_reclock_state {
-	struct ctdb_context *ctdb;
-	struct timeval start_time;
-	int fd[2];
-	pid_t child;
-	struct timed_event *te;
-	struct fd_event *fde;
-	enum reclock_child_status status;
-};
-
-/* when we free the reclock state we must kill any child process.
-*/
-static int check_reclock_destructor(struct ctdb_check_reclock_state *state)
-{
-	struct ctdb_context *ctdb = state->ctdb;
-
-	ctdb_ctrl_report_recd_lock_latency(ctdb, CONTROL_TIMEOUT(), timeval_elapsed(&state->start_time));
-
-	if (state->fd[0] != -1) {
-		close(state->fd[0]);
-		state->fd[0] = -1;
-	}
-	if (state->fd[1] != -1) {
-		close(state->fd[1]);
-		state->fd[1] = -1;
-	}
-	ctdb_kill(ctdb, state->child, SIGKILL);
-	return 0;
-}
-
-/*
-  called if our check_reclock child times out. this would happen if
-  i/o to the reclock file blocks.
- */
-static void ctdb_check_reclock_timeout(struct event_context *ev, struct timed_event *te, 
-					 struct timeval t, void *private_data)
-{
-	struct ctdb_check_reclock_state *state = talloc_get_type(private_data, 
-					   struct ctdb_check_reclock_state);
-
-	DEBUG(DEBUG_ERR,(__location__ " check_reclock child process hung/timedout CFS slow to grant locks?\n"));
-	state->status = RECLOCK_TIMEOUT;
-}
-
-/* this is called when the child process has completed checking the reclock
-   file and has written data back to us through the pipe.


-- 
Samba Shared Repository