[SCM] Samba Shared Repository - branch master updated

Amitay Isaacs amitay at samba.org
Mon Feb 25 02:41:01 UTC 2019


The branch, master has been updated
       via  c93430fe8fe ctdb-cluster-mutex: Separate out command and file handling
       via  ebc082122fb ctdb-tests: Add a test for configuring the recovery lock as a command
       via  e74f5243fcb ctdb-tests: Add -R option for local daemons to use recovery lock command
       via  ce09d9c3e4c ctdb-tests: Force test failure if local daemon setup fails
       via  13a1a480893 ctdb-recoverd: Time out attempt to take recovery lock after 120s
       via  45a77d65b2e ctdb-recoverd: Ban node on unknown error when taking recovery lock
       via  c0fb62ed395 ctdb-recoverd: Make recoverd context available in recovery lock handle
       via  7e4aae69432 ctdb-recoverd: Clean up logging on failure to take recovery lock
       via  621658cbed5 ctdb-recoverd: Free cluster mutex handler on failure to take lock
       via  82e7f382148 ctdb-config: Change example recovery lock setting to one that fails
      from  12da33e2bbc smbd: unix_convert: Ensure we don't call get_real_filename on POSIX paths.

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit c93430fe8fe530a55b9a04cf6cc660c3d420e333
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jan 21 12:16:43 2019 +1100

    ctdb-cluster-mutex: Separate out command and file handling
    
    This code is difficult to read and there really is no common code
    between the 2 cases.  For example, there is no need to split a
    filename into words.  Separating each of the 2 cases into its own
    function makes the logic much easier to understand.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Mon Feb 25 03:40:16 CET 2019 on sn-devel-144

commit ebc082122fb34ffb8cbcafde9ad39bcc241d33ed
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jan 21 12:15:33 2019 +1100

    ctdb-tests: Add a test for configuring the recovery lock as a command
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit e74f5243fcb7594939769c16f3c79ab167dd1227
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jan 21 12:13:29 2019 +1100

    ctdb-tests: Add -R option for local daemons to use recovery lock command
    
    Under the covers, a command is always used.  However, there is no way
    of testing of the code path where a command is explicitly configured.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit ce09d9c3e4c72ebec7a21686ae913398a5c9020f
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jan 21 12:13:08 2019 +1100

    ctdb-tests: Force test failure if local daemon setup fails
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 13a1a4808935290dceb219daccd7aac3fda4e184
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Feb 22 15:09:33 2019 +1100

    ctdb-recoverd: Time out attempt to take recovery lock after 120s
    
    Currently this will wait forever.  It really needs a timeout in case
    the cluster filesystem (or other lock mechanism) is completely wedged.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 45a77d65b2e39b4af94da4ab99575f4ee08a7ebd
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jan 10 14:01:57 2019 +1100

    ctdb-recoverd: Ban node on unknown error when taking recovery lock
    
    We really shouldn't see unknown errors.  They probably represent a
    misconfigured recovery lock or similar.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit c0fb62ed3954fc6e8667480aba92003fc270f257
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jan 10 13:24:34 2019 +1100

    ctdb-recoverd: Make recoverd context available in recovery lock handle
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 7e4aae6943291c3144c8a3ff97537e8d4c7dc7c9
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jan 21 16:36:13 2019 +1100

    ctdb-recoverd: Clean up logging on failure to take recovery lock
    
    Add an explicit case for a timeout and clean up the other messages.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 621658cbed5d91d7096fc208bac2ff93a1880e7d
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jan 21 16:28:28 2019 +1100

    ctdb-recoverd: Free cluster mutex handler on failure to take lock
    
    If nested events occur while the file descriptor handler is still
    active then chaos can ensue.  For example, if a node is banned and the
    lock is explicitly cancelled (e.g. due to election loss) then
    double-talloc-free()s abound.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 82e7f38214896c2c200132bc6dde3348cfac16cc
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jan 10 14:15:18 2019 +1100

    ctdb-config: Change example recovery lock setting to one that fails
    
    ctdbd will start without a recovery lock configured.  It will log a
    message saying that this is not optimal.  However, a careless user may
    overlook both this message and the importance of setting a recovery
    lock.  If the existing example configuration is uncommented then the
    directory containing it will be created (by 01.reclock.script) and the
    failure (i.e. multiple nodes able to take the lock) will be confusing.
    
    Instead, change the example setting to one that will result in banned
    nodes, encouraging users to consciously configure (or deconfigure) the
    recovery lock.  Tweak the corresponding comment.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=13790
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 ctdb/config/ctdb.conf                              |  13 ++-
 ctdb/server/ctdb_cluster_mutex.c                   | 113 +++++++++++++--------
 ctdb/server/ctdb_recoverd.c                        |  36 ++++++-
 ctdb/tests/local_daemons.sh                        |  12 ++-
 ..._eventscripts.sh => 01_ctdb_reclock_command.sh} |   6 +-
 ctdb/tests/simple/scripts/local_daemons.bash       |   6 ++
 6 files changed, 131 insertions(+), 55 deletions(-)
 copy ctdb/tests/simple/{28_zero_eventscripts.sh => 01_ctdb_reclock_command.sh} (67%)


Changeset truncated at 500 lines:

diff --git a/ctdb/config/ctdb.conf b/ctdb/config/ctdb.conf
index a9e6f693405..5440600a435 100644
--- a/ctdb/config/ctdb.conf
+++ b/ctdb/config/ctdb.conf
@@ -11,7 +11,12 @@
 	# log level = NOTICE
 
 [cluster]
-	# Shared recovery lock file to avoid split brain.  No default.
-	# Do NOT run CTDB without a recovery lock file unless you know exactly
-	# what you are doing.
-	# recovery lock = /shared/recovery.lock
+	# Shared recovery lock file to avoid split brain.  Daemon
+	# default is no recovery lock.  Do NOT run CTDB without a
+	# recovery lock file unless you know exactly what you are
+	# doing.
+	#
+	# Please see the RECOVERY LOCK section in ctdb(7) for more
+	# details.
+	#
+	# recovery lock = !/bin/false RECOVERY LOCK NOT CONFIGURED
diff --git a/ctdb/server/ctdb_cluster_mutex.c b/ctdb/server/ctdb_cluster_mutex.c
index 330d5fd1d90..2e3cb8112ad 100644
--- a/ctdb/server/ctdb_cluster_mutex.c
+++ b/ctdb/server/ctdb_cluster_mutex.c
@@ -118,72 +118,101 @@ static void cluster_mutex_handler(struct tevent_context *ev,
 
 static char cluster_mutex_helper[PATH_MAX+1] = "";
 
-static bool cluster_mutex_helper_args(TALLOC_CTX *mem_ctx,
-				      const char *argstring, char ***argv)
+static bool cluster_mutex_helper_args_file(TALLOC_CTX *mem_ctx,
+					   const char *argstring,
+					   char ***argv)
 {
-	int nargs, i, ret, n;
-	bool is_command = false;
+	bool ok;
 	char **args = NULL;
-	char *strv = NULL;
-	char *t = NULL;
 
-	if (argstring != NULL && argstring[0] == '!') {
-		/* This is actually a full command */
-		is_command = true;
-		t = discard_const(&argstring[1]);
-	} else {
-		is_command = false;
-		t = discard_const(argstring);
+	ok = ctdb_set_helper("cluster mutex helper",
+			     cluster_mutex_helper,
+			     sizeof(cluster_mutex_helper),
+			     "CTDB_CLUSTER_MUTEX_HELPER",
+			     CTDB_HELPER_BINDIR,
+			     "ctdb_mutex_fcntl_helper");
+	if (! ok) {
+		DBG_ERR("ctdb exiting with error: "
+			"Unable to set cluster mutex helper\n");
+		exit(1);
 	}
 
-	ret = strv_split(mem_ctx, &strv, t, " \t");
-	if (ret != 0) {
-		DEBUG(DEBUG_ERR,
-		      ("Unable to parse mutex helper string \"%s\" (%s)\n",
-		       argstring, strerror(ret)));
+
+	/* Array includes default helper, file and NULL */
+	args = talloc_array(mem_ctx, char *, 3);
+	if (args == NULL) {
+		DBG_ERR("Memory allocation error\n");
 		return false;
 	}
-	n = strv_count(strv);
 
-	args = talloc_array(mem_ctx, char *, n + (is_command ? 1 : 2));
+	args[0] = cluster_mutex_helper;
 
-	if (args == NULL) {
-		DEBUG(DEBUG_ERR,(__location__ " out of memory\n"));
+	args[1] = talloc_strdup(args, argstring);
+	if (args[1] == NULL) {
+		DBG_ERR("Memory allocation error\n");
 		return false;
 	}
 
-	nargs = 0;
-
-	if (! is_command) {
-		if (!ctdb_set_helper("cluster mutex helper",
-				     cluster_mutex_helper,
-				     sizeof(cluster_mutex_helper),
-				     "CTDB_CLUSTER_MUTEX_HELPER",
-				     CTDB_HELPER_BINDIR,
-				     "ctdb_mutex_fcntl_helper")) {
-			DEBUG(DEBUG_ERR,("ctdb exiting with error: %s\n",
-					 __location__
-					 " Unable to set cluster mutex helper\n"));
-			exit(1);
-		}
+	args[2] = NULL;
+
+	*argv = args;
+	return true;
+}
 
-		args[nargs++] = cluster_mutex_helper;
+static bool cluster_mutex_helper_args_cmd(TALLOC_CTX *mem_ctx,
+					  const char *argstring,
+					  char ***argv)
+{
+	int i, ret, n;
+	char **args = NULL;
+	char *strv = NULL;
+	char *t = NULL;
+
+	ret = strv_split(mem_ctx, &strv, argstring, " \t");
+	if (ret != 0) {
+		D_ERR("Unable to parse mutex helper command \"%s\" (%s)\n",
+		      argstring,
+		      strerror(ret));
+		return false;
 	}
+	n = strv_count(strv);
+
+	/* Extra slot for NULL */
+	args = talloc_array(mem_ctx, char *, n + 1);
+	if (args == NULL) {
+		DBG_ERR("Memory allocation error\n");
+		return false;
+	}
+
+	talloc_steal(args, strv);
 
 	t = NULL;
-	for (i = 0; i < n; i++) {
-		/* Don't copy, just keep cmd_args around */
+	for (i = 0 ; i < n; i++) {
 		t = strv_next(strv, t);
-		args[nargs++] = t;
+		args[i] = t;
 	}
 
-	/* Make sure last argument is NULL */
-	args[nargs] = NULL;
+	args[n] = NULL;
 
 	*argv = args;
 	return true;
 }
 
+static bool cluster_mutex_helper_args(TALLOC_CTX *mem_ctx,
+				      const char *argstring,
+				      char ***argv)
+{
+	bool ok;
+
+	if (argstring != NULL && argstring[0] == '!') {
+		ok = cluster_mutex_helper_args_cmd(mem_ctx, &argstring[1], argv);
+	} else {
+		ok = cluster_mutex_helper_args_file(mem_ctx, argstring, argv);
+	}
+
+	return ok;
+}
+
 struct ctdb_cluster_mutex_handle *
 ctdb_cluster_mutex(TALLOC_CTX *mem_ctx,
 		   struct ctdb_context *ctdb,
diff --git a/ctdb/server/ctdb_recoverd.c b/ctdb/server/ctdb_recoverd.c
index 578127a4514..9b3559b2a92 100644
--- a/ctdb/server/ctdb_recoverd.c
+++ b/ctdb/server/ctdb_recoverd.c
@@ -888,6 +888,7 @@ struct ctdb_recovery_lock_handle {
 	bool locked;
 	double latency;
 	struct ctdb_cluster_mutex_handle *h;
+	struct ctdb_recoverd *rec;
 };
 
 static void take_reclock_handler(char status,
@@ -897,22 +898,45 @@ static void take_reclock_handler(char status,
 	struct ctdb_recovery_lock_handle *s =
 		(struct ctdb_recovery_lock_handle *) private_data;
 
+	s->locked = (status == '0') ;
+
+	/*
+	 * If unsuccessful then ensure the process has exited and that
+	 * the file descriptor event handler has been cancelled
+	 */
+	if (! s->locked) {
+		TALLOC_FREE(s->h);
+	}
+
 	switch (status) {
 	case '0':
 		s->latency = latency;
 		break;
 
 	case '1':
-		DEBUG(DEBUG_ERR,
-		      ("Unable to take recovery lock - contention\n"));
+		D_ERR("Unable to take recovery lock - contention\n");
+		break;
+
+	case '2':
+		D_ERR("Unable to take recovery lock - timeout\n");
 		break;
 
 	default:
-		DEBUG(DEBUG_ERR, ("ERROR: when taking recovery lock\n"));
+		D_ERR("Unable to take recover lock - unknown error\n");
+
+		{
+			struct ctdb_recoverd *rec = s->rec;
+			struct ctdb_context *ctdb = rec->ctdb;
+			uint32_t pnn = ctdb_get_pnn(ctdb);
+
+			D_ERR("Banning this node\n");
+			ctdb_ban_node(rec,
+				      pnn,
+				      ctdb->tunable.recovery_ban_period);
+		}
 	}
 
 	s->done = true;
-	s->locked = (status == '0') ;
 }
 
 static void force_election(struct ctdb_recoverd *rec,
@@ -942,10 +966,12 @@ static bool ctdb_recovery_lock(struct ctdb_recoverd *rec)
 		return false;
 	};
 
+	s->rec = rec;
+
 	h = ctdb_cluster_mutex(s,
 			       ctdb,
 			       ctdb->recovery_lock,
-			       0,
+			       120,
 			       take_reclock_handler,
 			       s,
 			       lost_reclock_handler,
diff --git a/ctdb/tests/local_daemons.sh b/ctdb/tests/local_daemons.sh
index 9329a60643c..07cf1e9b135 100755
--- a/ctdb/tests/local_daemons.sh
+++ b/ctdb/tests/local_daemons.sh
@@ -132,6 +132,7 @@ Options:
   -N <file>     Nodes file (default: automatically generated)
   -n <num>      Number of nodes (default: 3)
   -P <file>     Public addresses file (default: automatically generated)
+  -R            Use a command for the recovery lock (default: use a file)
   -S <library>  Socket wrapper shared library to preload (default: none)
   -6            Generate IPv6 IPs for nodes, public addresses (default: IPv4)
 EOF
@@ -145,17 +146,19 @@ local_daemons_setup ()
 	_nodes_file=""
 	_num_nodes=3
 	_public_addresses_file=""
+	_recovery_lock_use_command=false
 	_socket_wrapper=""
 	_use_ipv6=false
 
 	set -e
 
-	while getopts "FN:n:P:S:6h?" _opt ; do
+	while getopts "FN:n:P:RS:6h?" _opt ; do
 		case "$_opt" in
 		F) _disable_failover=true ;;
 		N) _nodes_file="$OPTARG" ;;
 		n) _num_nodes="$OPTARG" ;;
 		P) _public_addresses_file="$OPTARG" ;;
+		R) _recovery_lock_use_command=true ;;
 		S) _socket_wrapper="$OPTARG" ;;
 		6) _use_ipv6=true ;;
 		\?|h) local_daemons_setup_usage ;;
@@ -188,6 +191,11 @@ local_daemons_setup ()
 				       $_use_ipv6 >"$_public_addresses_all"
 	fi
 
+	_recovery_lock="${directory}/rec.lock"
+	if $_recovery_lock_use_command ; then
+		_recovery_lock="! ${CTDB_CLUSTER_MUTEX_HELPER} ${_recovery_lock}"
+	fi
+
 	if [ -n "$_socket_wrapper" ] ; then
 		setup_socket_wrapper "$_socket_wrapper"
 	fi
@@ -221,7 +229,7 @@ local_daemons_setup ()
 	log level = INFO
 
 [cluster]
-	recovery lock = ${directory}/rec.lock
+	recovery lock = ${_recovery_lock}
 	node address = ${_node_ip}
 
 [database]
diff --git a/ctdb/tests/simple/28_zero_eventscripts.sh b/ctdb/tests/simple/01_ctdb_reclock_command.sh
similarity index 67%
copy from ctdb/tests/simple/28_zero_eventscripts.sh
copy to ctdb/tests/simple/01_ctdb_reclock_command.sh
index 75e5e047a73..34f82e981a1 100755
--- a/ctdb/tests/simple/28_zero_eventscripts.sh
+++ b/ctdb/tests/simple/01_ctdb_reclock_command.sh
@@ -3,7 +3,8 @@
 test_info()
 {
     cat <<EOF
-Check that CTDB operated correctly if there are 0 event scripts
+Check that CTDB operates correctly if the recovery lock is configured
+as a command.
 
 This test only does anything with local daemons.  On a real cluster it
 has no way of updating configuration.
@@ -19,7 +20,8 @@ if [ -z "$TEST_LOCAL_DAEMONS" ] ; then
 	exit 0
 fi
 
-ctdb_test_init --no-event_scripts
+echo "Starting CTDB with recovery lock command configured..."
+ctdb_test_init --reclock-use-command
 
 cluster_is_healthy
 
diff --git a/ctdb/tests/simple/scripts/local_daemons.bash b/ctdb/tests/simple/scripts/local_daemons.bash
index cf6671757b3..5da900288b2 100644
--- a/ctdb/tests/simple/scripts/local_daemons.bash
+++ b/ctdb/tests/simple/scripts/local_daemons.bash
@@ -22,10 +22,12 @@ setup_ctdb ()
 	local public_addresses=""
 	local no_event_scripts=false
 	local disable_failover=""
+	local reclock_use_command=""
 	case "$1" in
 	--no-public-addresses) public_addresses="/dev/null" ;;
 	--no-event-scripts)    no_event_scripts=true    ;;
 	--disable-failover)    disable_failover="yes"   ;;
+	--reclock-use-command) reclock_use_command="yes" ;;
 	esac
 
 	$ctdb_local_daemons setup \
@@ -33,7 +35,11 @@ setup_ctdb ()
 		${disable_failover:+-F} \
 		${public_addresses:+-P} ${public_addresses} \
 		${CTDB_USE_IPV6:+-6} \
+		${reclock_use_command:+-R} \
 		${TEST_SOCKET_WRAPPER_SO_PATH:+-S} ${TEST_SOCKET_WRAPPER_SO_PATH}
+	if [ $? -ne 0 ] ; then
+		exit 1
+	fi
 
 	local pnn
 	for pnn in $(seq 0 $(($TEST_LOCAL_DAEMONS - 1))) ; do


-- 
Samba Shared Repository



More information about the samba-cvs mailing list