[SCM] CTDB repository - branch 1.2.40 updated - ctdb-1.2.66-14-g91f522f

Wed Aug 28 22:41:07 MDT 2013

The branch, 1.2.40 has been updated
       via  91f522f928f28b3c3463963aedd71a251545b910 (commit)
       via  dec866151a85cd2574a1e6acefc0125386fe854b (commit)
       via  91d60247b360b032a987604f60220176d350daa2 (commit)
       via  b0d147dbac28a4dd9a5d002ded3f0d0488009ebc (commit)
       via  1268ed6edbdee97f6757205bb10d1f285f6394c6 (commit)
       via  3e898f99ba497e1c9f9bb3db02cb0285f6d27a82 (commit)
       via  04922de5ffbaaec7384990dd1b5af412982eb716 (commit)
       via  2f4dab3d06759e6fea4b6fbc6599aba53d68e9b3 (commit)
       via  61de7d17229c7d3061bf8501e66d7a18f16feabf (commit)
       via  3bdc8331051b0182d5383fb3b16b34dd4dabd3d1 (commit)
       via  9132e6814ed927fa317f333f03dedb18f75d0e5b (commit)
       via  ec20cf74ac70434402d7ccf2d72c2e1b86ed87be (commit)
       via  d9f6ddb67ec06ba87a7debc04908296773809bf2 (commit)
       via  8d251ce2871770708a2304fa5dae2ddab12d2539 (commit)
      from  9321cc2b24c351bca92bf728046cafa3073ef89a (commit)

http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=1.2.40


- Log -----------------------------------------------------------------
commit 91f522f928f28b3c3463963aedd71a251545b910
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Wed Aug 14 16:23:27 2013 +1000

    New version 1.2.67
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>

commit dec866151a85cd2574a1e6acefc0125386fe854b
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Aug 14 19:17:46 2013 +1000

    client: Change timeout to 10 seconds for the call to ctdb_ctrl_getpnn()
    
    A more flexible solution would be to backport the patch to add a
    timeout argument to ctdb_cmdline_client() but that breaks to many
    things for this branch.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 91d60247b360b032a987604f60220176d350daa2
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Aug 9 11:56:29 2013 +1000

    tools/ctdb: Increase default control timeout to 10 seconds
    
    The current 3 second timeout is arbitrary and users trip over it
    sometimes.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    (cherry picked from commit b49c4f39666d5b1596213bf41bcdc47ed3c327ae)

commit b0d147dbac28a4dd9a5d002ded3f0d0488009ebc
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Tue Aug 13 14:02:46 2013 +1000

    recoverd: Use TDB_INCOMPATIBLE_HASH when creating volatile databases
    
    When creating missing databases either locally or remotely, recovery
    master calls ctdb_ctrl_createdb().  Recovery master always passes 0
    for tdb_flags.  For volatile databases, if TDB_INCOMPATIBLE_HASH is not
    specified, then they will be attached without using jenkins hash causing
    database corruption.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 2fc6b6403707a292d134140fc0b9145b454992c5)

commit 1268ed6edbdee97f6757205bb10d1f285f6394c6
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Wed Jul 10 12:23:30 2013 +1000

    ctdbd: Print tdb flags when logging attached to database message
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 846109169ee5e3d03135156e45c8dac93aa2e95b)

commit 3e898f99ba497e1c9f9bb3db02cb0285f6d27a82
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Aug 14 15:40:27 2013 +1000

    tools/ctdb: Make ban/unban more resilient to timeouts
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 04922de5ffbaaec7384990dd1b5af412982eb716
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Aug 8 14:37:03 2013 +1000

    eventscripts: Move NFS reconfigure to "ipreallocated" event
    
    Doing this in the "monitor" event is unsafe because it causes the node
    health status to flip-flop.  At the moment when a node goes unhealthy
    it is failed out, IPs are released and the monitor event handles the
    reconfigure, returning 0 even though the service failure is
    unresolved.
    
    This change was made in the master branch a long time ago.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 2f4dab3d06759e6fea4b6fbc6599aba53d68e9b3
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Aug 6 16:46:21 2013 +1000

    eventscripts: Change the nfsd RPC check failure policy
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 61de7d17229c7d3061bf8501e66d7a18f16feabf
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Aug 6 16:46:01 2013 +1000

    eventscripts: New function ctdb_check_counter()
    
    This provides much more flexible counter handling.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 3bdc8331051b0182d5383fb3b16b34dd4dabd3d1
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Aug 6 16:44:50 2013 +1000

    eventscripts: Add optional counter name argument to some counter functions
    
    This helps some calling code look less like line noise.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 9132e6814ed927fa317f333f03dedb18f75d0e5b
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Aug 2 16:29:32 2013 +1000

    recoverd: Banned nodes should not be told to run "ipreallocated" event
    
    They will reject it because they are in recovery.  This can result in
    extra banning credits being applied to banned nodes.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit ec20cf74ac70434402d7ccf2d72c2e1b86ed87be
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jul 22 16:39:46 2013 +1000

    recoverd: Call takeover fail callback only once per node
    
    Currently the fail callback is called once per (takeip/releaseip) control
    failure.  This is overkill and can get a node banned much too quickly.
    
    Instead, keep track of control failures per node and only call fail
    callback once per failed node.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit bf4a7c1ad87e0e848296d15d63eb8cd901ca5335)
    
    Conflicts:
    	server/ctdb_takeover.c

commit d9f6ddb67ec06ba87a7debc04908296773809bf2
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri May 31 14:55:07 2013 +1000

    recoverd: Log node that causes takoever run to fail
    
    Extend takeover_fail_callback() to just log (and not do any ban
    processing) when the callback data is NULL.  Always call
    ctdb_takeover_run() with the callback so that useful errors are always
    logged.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit c429394afbabaee09f9216dc743419adddf523ea)

commit 8d251ce2871770708a2304fa5dae2ddab12d2539
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Jun 24 17:37:15 2013 +1000

    client: Exit with non-zero status when unix socket is closed
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 733fc909425860f6a02c205c2d8f34a731853922)

-----------------------------------------------------------------------

Summary of changes:
 client/ctdb_client.c       |   12 +++++++--
 common/cmdline.c           |    2 +-
 config/events.d/60.nfs     |   37 +++++++++++-------------------
 config/functions           |   40 +++++++++++++++++++++++++++++----
 packaging/RPM/ctdb.spec.in |   10 +++++++-
 server/ctdb_ltdb_server.c  |    5 ++-
 server/ctdb_recoverd.c     |   18 +++++++++------
 server/ctdb_takeover.c     |   52 ++++++++++++++++++++++++++++++++++++++++---
 tools/ctdb.c               |   43 +++++++++++++++++++++++++++---------
 9 files changed, 162 insertions(+), 57 deletions(-)


Changeset truncated at 500 lines:

diff --git a/client/ctdb_client.c b/client/ctdb_client.c
index 0f0f175..cfabdaf 100644
--- a/client/ctdb_client.c
+++ b/client/ctdb_client.c
@@ -201,8 +201,8 @@ static void ctdb_client_read_cb(uint8_t *data, size_t cnt, void *args)
 	talloc_steal(tmp_ctx, hdr);
 
 	if (cnt == 0) {
-		DEBUG(DEBUG_INFO,("Daemon has exited - shutting down client\n"));
-		exit(0);
+		DEBUG(DEBUG_CRIT,("Daemon has exited - shutting down client\n"));
+		exit(1);
 	}
 
 	if (cnt < sizeof(*hdr)) {
@@ -1738,11 +1738,17 @@ int ctdb_ctrl_createdb(struct ctdb_context *ctdb, struct timeval timeout, uint32
 	int ret;
 	int32_t res;
 	TDB_DATA data;
+	uint64_t tdb_flags = 0;
 
 	data.dptr = discard_const(name);
 	data.dsize = strlen(name)+1;
 
-	ret = ctdb_control(ctdb, destnode, 0, 
+	/* Make sure that volatile databases use jenkins hash */
+	if (!persistent) {
+		tdb_flags = TDB_INCOMPATIBLE_HASH;
+	}
+
+	ret = ctdb_control(ctdb, destnode, tdb_flags,
 			   persistent?CTDB_CONTROL_DB_ATTACH_PERSISTENT:CTDB_CONTROL_DB_ATTACH, 
 			   0, data, 
 			   mem_ctx, &data, &res, &timeout, NULL);
diff --git a/common/cmdline.c b/common/cmdline.c
index 145a13a..2c7b2cd 100644
--- a/common/cmdline.c
+++ b/common/cmdline.c
@@ -152,7 +152,7 @@ struct ctdb_context *ctdb_cmdline_client(struct event_context *ev)
 	}
 
 	/* get our pnn */
-	ctdb->pnn = ctdb_ctrl_getpnn(ctdb, timeval_current_ofs(3, 0), CTDB_CURRENT_NODE);
+	ctdb->pnn = ctdb_ctrl_getpnn(ctdb, timeval_current_ofs(10, 0), CTDB_CURRENT_NODE);
 	if (ctdb->pnn == (uint32_t)-1) {
 		DEBUG(DEBUG_CRIT,(__location__ " Failed to get ctdb pnn\n"));
 		talloc_free(ctdb);
diff --git a/config/events.d/60.nfs b/config/events.d/60.nfs
index 3ae8f24..f567c82 100755
--- a/config/events.d/60.nfs
+++ b/config/events.d/60.nfs
@@ -70,11 +70,6 @@ case "$1" in
 	;;
 
       monitor)
-	if ctdb_service_needs_reconfigure ; then
-	    ctdb_service_reconfigure
-	    exit 0
-	fi
-
 	update_tickles 2049
 
 	# check that statd responds to rpc requests
@@ -104,29 +99,20 @@ case "$1" in
 	}
 
 	# check that NFS responds to rpc requests
-	[ "$CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK" = "yes" ] || {
+	if [ "$CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK" != "yes" ] ; then
 	    if ctdb_check_rpc "NFS" 100003 3 >/dev/null ; then
-		(service_name="nfs_knfsd"; ctdb_counter_init)
+		ctdb_counter_init "nfs_knfsd"
 	    else
-		(
-			service_name="nfs_knfsd"
-			ctdb_counter_incr
+		ctdb_counter_incr "nfs_knfsd"
 
-			ctdb_check_counter_equal 2 || {
-				echo "Trying to restart NFS service"
-				startstop_nfs restart >/dev/null 2>&1 &
-				exit 0
-			}
+		if ! ctdb_check_counter "quiet" % 10 "nfs_knfsd" ; then
+		    echo "Trying to restart NFS service"
+		    startstop_nfs restart >/dev/null 2>&1 &
+		fi
 
-			ctdb_check_counter_limit 5 quiet >/dev/null
-		) || {
-			echo "$ctdb_check_rpc_out"
-			echo "Trying to restart NFS service"
-			startstop_nfs restart
-			exit 1
-		}
+		ctdb_check_counter "error" -ge 2 "nfs_knfsd"
 	    fi
-	}
+	fi
 
 	# check that lockd responds to rpc requests
 	if ctdb_check_rpc "LOCKD" 100021 1 >/dev/null ; then
@@ -210,6 +196,11 @@ case "$1" in
        	;;
 
     ipreallocated)
+	if ctdb_service_needs_reconfigure ; then
+	    ctdb_service_reconfigure
+	    exit 0
+	fi
+
 	# if the ips have been reallocated, we must restart the lockmanager
 	# across all nodes and ping all statd listeners
 	[ -x $CTDB_BASE/statd-callout ] && {
diff --git a/config/functions b/config/functions
index b35f60f..9cf4ece 100755
--- a/config/functions
+++ b/config/functions
@@ -759,16 +759,17 @@ setup_iface_ip_readd_script()
 # ctdb_check_counter_limit succeeds when count >= <limit>
 ########################################################
 _ctdb_counter_common () {
-    _counter_file="$ctdb_fail_dir/$service_name"
+    _service_name="${1:-${service_name}}"
+    _counter_file="$ctdb_fail_dir/$_service_name"
     mkdir -p "${_counter_file%/*}" # dirname
 }
 ctdb_counter_init () {
-    _ctdb_counter_common
+    _ctdb_counter_common "$1"
 
     >"$_counter_file"
 }
 ctdb_counter_incr () {
-    _ctdb_counter_common
+    _ctdb_counter_common "$1"
 
     # unary counting!
     echo -n 1 >> "$_counter_file"
@@ -782,10 +783,10 @@ ctdb_check_counter_limit () {
     # unary counting!
     _size=$(stat -c "%s" "$_counter_file" 2>/dev/null || echo 0)
     if [ $_size -ge $_limit ] ; then
-	echo "ERROR: more than $_limit consecutive failures for $service_name, marking cluster unhealthy"
+	echo "ERROR: more than $_limit consecutive failures for $_service_name, marking cluster unhealthy"
 	exit 1
     elif [ $_size -gt 0 -a -z "$_quiet" ] ; then
-	echo "WARNING: less than $_limit consecutive failures ($_size) for $service_name, not unhealthy yet"
+	echo "WARNING: less than $_limit consecutive failures ($_size) for $_service_name, not unhealthy yet"
     fi
 }
 ctdb_check_counter_equal () {
@@ -801,6 +802,35 @@ ctdb_check_counter_equal () {
     return 0
 }
 
+ctdb_check_counter () {
+    _msg="${1:-error}"  # "error"  - anything else is silent on fail
+    _op="${2:--ge}"  # an integer operator supported by test
+    _limit="${3:-${service_fail_limit}}"
+    shift 3
+    _ctdb_counter_common "$1"
+
+    # unary counting!
+    _size=$(stat -c "%s" "$_counter_file" 2>/dev/null || echo 0)
+    _hit=false
+    if [ "$_op" != "%" ] ; then
+	if [ $_size $_op $_limit ] ; then
+	    _hit=true
+	fi
+    else
+	if [ $(($_size $_op $_limit)) -eq 0 ] ; then
+	    _hit=true
+	fi
+    fi
+    if $_hit ; then
+	if [ "$_msg" = "error" ] ; then
+	    echo "ERROR: $_size consecutive failures for $_service_name, marking node unhealthy"
+	    exit 1
+	else
+	    return 1
+	fi
+    fi
+}
+
 ########################################################
 
 ctdb_spool_dir="/var/spool/ctdb"
diff --git a/packaging/RPM/ctdb.spec.in b/packaging/RPM/ctdb.spec.in
index 8f74e3e..b3daf2f 100644
--- a/packaging/RPM/ctdb.spec.in
+++ b/packaging/RPM/ctdb.spec.in
@@ -3,7 +3,7 @@ Name: ctdb
 Summary: Clustered TDB
 Vendor: Samba Team
 Packager: Samba Team <samba at samba.org>
-Version: 1.2.66
+Version: 1.2.67
 Release: 1GITHASH
 Epoch: 0
 License: GNU GPL version 3
@@ -155,6 +155,14 @@ development libraries for ctdb
 
 %changelog
 
+* Wed Aug 14 2013 : Version 1.2.67
+  - When takeover fails, call fail callback only once and not once per IP
+  - Do not send ipreallocated event to banned nodes
+  - If rpc check fails for nfs, mark node unhealthy after 2 failures
+    and restart after every 10 failures
+  - Make ctdb ban/unban more resilient to individual control timeouts
+  - Make sure that volatile databases are always used with jenkins hash
+  - Increase the default control timeout in ctdb tool from 3s to 10s
 * Thu Jul 18 2013 : Version 1.2.66
   - A missing interface should cause monitoring to fail
 * Tue Jul 02 2013 : Version 1.2.65
diff --git a/server/ctdb_ltdb_server.c b/server/ctdb_ltdb_server.c
index b87e176..9af8ddc 100644
--- a/server/ctdb_ltdb_server.c
+++ b/server/ctdb_ltdb_server.c
@@ -1036,8 +1036,9 @@ again:
 	}
 
 
-	DEBUG(DEBUG_INFO,("Attached to database '%s'\n", ctdb_db->db_path));
-	
+	DEBUG(DEBUG_NOTICE,("Attached to database '%s' with flags 0x%x\n",
+			    ctdb_db->db_path, tdb_flags));
+
 	/* success */
 	return 0;
 }
diff --git a/server/ctdb_recoverd.c b/server/ctdb_recoverd.c
index 15d7bbe..9f2c71c 100644
--- a/server/ctdb_recoverd.c
+++ b/server/ctdb_recoverd.c
@@ -1483,12 +1483,16 @@ static int sync_recovery_lock_file_across_cluster(struct ctdb_recoverd *rec)
  */
 static void takeover_fail_callback(struct ctdb_context *ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void *callback_data)
 {
-	struct ctdb_recoverd *rec = talloc_get_type(callback_data, struct ctdb_recoverd);
+	DEBUG(DEBUG_ERR, ("Node %u failed the takeover run\n", node_pnn));
 
-	DEBUG(DEBUG_ERR, (__location__ " Node %u failed the takeover run. Setting it as recovery fail culprit\n", node_pnn));
+	if (callback_data != NULL) {
+		struct ctdb_recoverd *rec = talloc_get_type(callback_data, struct ctdb_recoverd);
 
-	ctdb_set_culprit(rec, node_pnn);
-	rec->need_takeover_run = true;
+		DEBUG(DEBUG_ERR, ("Setting node %u as recovery fail culprit\n", node_pnn));
+
+		ctdb_set_culprit(rec, node_pnn);
+		rec->need_takeover_run = true;
+	}
 }
 
 
@@ -1798,7 +1802,7 @@ static int do_recovery(struct ctdb_recoverd *rec,
 		return -1;
 	}
 	rec->need_takeover_run = false;
-	ret = ctdb_takeover_run(ctdb, nodemap, NULL, NULL);
+	ret = ctdb_takeover_run(ctdb, nodemap, takeover_fail_callback, NULL);
 	if (ret != 0) {
 		DEBUG(DEBUG_ERR, (__location__ " Unable to setup public takeover addresses. ctdb_takeover_run() failed.\n"));
 		rec->need_takeover_run = true;
@@ -2117,7 +2121,7 @@ static void ctdb_rebalance_timeout(struct event_context *ev, struct timed_event
 
 	DEBUG(DEBUG_NOTICE,("Rebalance all nodes that have had ip assignment changes.\n"));
 
-	ret = ctdb_takeover_run(ctdb, rec->nodemap, NULL, NULL);
+	ret = ctdb_takeover_run(ctdb, rec->nodemap, takeover_fail_callback, NULL);
 	if (ret != 0) {
 		DEBUG(DEBUG_ERR, (__location__ " Unable to setup public takeover addresses. ctdb_takeover_run() failed.\n"));
 		rec->need_takeover_run = true;
@@ -2264,7 +2268,7 @@ static void process_ipreallocate_requests(struct ctdb_context *ctdb, struct ctdb
 		rec->need_takeover_run = true;
 	}
 	if (ret == 0) {
-		ret = ctdb_takeover_run(ctdb, rec->nodemap, NULL, NULL);
+		ret = ctdb_takeover_run(ctdb, rec->nodemap, takeover_fail_callback, NULL);
 		if (ret != 0) {
 			DEBUG(DEBUG_ERR,("Failed to reallocate addresses: ctdb_takeover_run() failed.\n"));
 			rec->need_takeover_run = true;
diff --git a/server/ctdb_takeover.c b/server/ctdb_takeover.c
index 481b4db..9fdf227 100644
--- a/server/ctdb_takeover.c
+++ b/server/ctdb_takeover.c
@@ -1987,6 +1987,40 @@ finished:
 	return;
 }
 
+struct takeover_callback_data {
+	bool *node_failed;
+	client_async_callback fail_callback;
+	void *fail_callback_data;
+	struct ctdb_node_map *nodemap;
+};
+
+static void takeover_run_fail_callback(struct ctdb_context *ctdb,
+				       uint32_t node_pnn, int32_t res,
+				       TDB_DATA outdata, void *callback_data)
+{
+	struct takeover_callback_data *cd =
+		talloc_get_type_abort(callback_data,
+				      struct takeover_callback_data);
+	int i;
+
+	for (i = 0; i < cd->nodemap->num; i++) {
+		if (node_pnn == cd->nodemap->nodes[i].pnn) {
+			break;
+		}
+	}
+
+	if (i == cd->nodemap->num) {
+		DEBUG(DEBUG_ERR, (__location__ " invalid PNN %u\n", node_pnn));
+		return;
+	}
+
+	if (!cd->node_failed[i]) {
+		cd->node_failed[i] = true;
+		cd->fail_callback(ctdb, node_pnn, res, outdata,
+				  cd->fail_callback_data);
+	}
+}
+
 /*
   make any IP alias changes for public addresses that are necessary 
  */
@@ -2003,6 +2037,7 @@ int ctdb_takeover_run(struct ctdb_context *ctdb, struct ctdb_node_map *nodemap,
 	struct client_async_data *async_data;
 	struct ctdb_client_control_state *state;
 	TALLOC_CTX *tmp_ctx = talloc_new(ctdb);
+	struct takeover_callback_data *takeover_data;
 
 	/*
 	 * ip failover is completely disabled, just send out the 
@@ -2020,11 +2055,21 @@ int ctdb_takeover_run(struct ctdb_context *ctdb, struct ctdb_node_map *nodemap,
 	/* now tell all nodes to delete any alias that they should not
 	   have.  This will be a NOOP on nodes that don't currently
 	   hold the given alias */
+	takeover_data = talloc_zero(tmp_ctx, struct takeover_callback_data);
+	CTDB_NO_MEMORY_FATAL(ctdb, takeover_data);
+
+	takeover_data->node_failed = talloc_zero_array(tmp_ctx,
+						       bool, nodemap->num);
+	CTDB_NO_MEMORY_FATAL(ctdb, takeover_data->node_failed);
+	takeover_data->fail_callback = fail_callback;
+	takeover_data->fail_callback_data = callback_data;
+	takeover_data->nodemap = nodemap;
+
 	async_data = talloc_zero(tmp_ctx, struct client_async_data);
 	CTDB_NO_MEMORY_FATAL(ctdb, async_data);
 
-	async_data->fail_callback = fail_callback;
-	async_data->callback_data = callback_data;
+	async_data->fail_callback = takeover_run_fail_callback;
+	async_data->callback_data = takeover_data;
 
 	for (i=0;i<nodemap->num;i++) {
 		/* don't talk to unconnected nodes, but do talk to banned nodes */
@@ -2141,8 +2186,7 @@ ipreallocated:
 	 */
 	data.dptr  = discard_const("ipreallocated");
 	data.dsize = strlen((char *)data.dptr) + 1; 
-	nodes = list_of_nodes(ctdb, nodemap, tmp_ctx,
-			      NODE_FLAGS_DISCONNECTED|NODE_FLAGS_STOPPED, -1);
+	nodes = list_of_nodes(ctdb, nodemap, tmp_ctx, NODE_FLAGS_INACTIVE, -1);
 	if (ctdb_client_async_control(ctdb, CTDB_CONTROL_RUN_EVENTSCRIPTS,
 				      nodes, 0, TAKEOVER_TIMEOUT(),
 				      false, data,
diff --git a/tools/ctdb.c b/tools/ctdb.c
index b191aad..99dad33 100644
--- a/tools/ctdb.c
+++ b/tools/ctdb.c
@@ -2649,11 +2649,22 @@ static int control_ban(struct ctdb_context *ctdb, int argc, const char **argv)
 	bantime.pnn  = options.pnn;
 	bantime.time = strtoul(argv[0], NULL, 0);
 
-	ret = ctdb_ctrl_set_ban(ctdb, TIMELIMIT(), options.pnn, &bantime);
-	if (ret != 0) {
-		DEBUG(DEBUG_ERR,("Banning node %d for %d seconds failed.\n", bantime.pnn, bantime.time));
-		return -1;
-	}	
+	do {
+		ret = ctdb_ctrl_set_ban(ctdb, TIMELIMIT(), options.pnn, &bantime);
+		if (ret != 0) {
+			DEBUG(DEBUG_WARNING, ("Unable to ban node %u for %d seconds\n",
+					      bantime.pnn, bantime.time));
+		}
+
+		sleep(1);
+
+		/* read the nodemap and verify the change took effect */
+		if (ctdb_ctrl_getnodemap(ctdb, TIMELIMIT(), CTDB_CURRENT_NODE, ctdb, &nodemap) != 0) {
+			DEBUG(DEBUG_WARNING, ("Unable to get nodemap from local node\n"));
+			nodemap = NULL;
+		}
+
+	} while (nodemap == NULL || !(nodemap->nodes[options.pnn].flags & NODE_FLAGS_BANNED));
 
 	ret = control_ipreallocate(ctdb, argc, argv);
 	if (ret != 0) {
@@ -2689,11 +2700,21 @@ static int control_unban(struct ctdb_context *ctdb, int argc, const char **argv)
 	bantime.pnn  = options.pnn;
 	bantime.time = 0;
 
-	ret = ctdb_ctrl_set_ban(ctdb, TIMELIMIT(), options.pnn, &bantime);
-	if (ret != 0) {
-		DEBUG(DEBUG_ERR,("Unbanning node %d failed.\n", bantime.pnn));
-		return -1;
-	}	
+	do {
+		ret = ctdb_ctrl_set_ban(ctdb, TIMELIMIT(), options.pnn, &bantime);
+		if (ret != 0) {
+			DEBUG(DEBUG_WARNING, ("Unable to unban node %u\n", bantime.pnn));
+		}
+
+		sleep(1);
+
+		/* read the nodemap and verify the change took effect */
+		if (ctdb_ctrl_getnodemap(ctdb, TIMELIMIT(), CTDB_CURRENT_NODE, ctdb, &nodemap) != 0) {
+			DEBUG(DEBUG_WARNING, ("Unable to get nodemap from local node\n"));
+			nodemap = NULL;
+		}
+
+	} while (nodemap == NULL || (nodemap->nodes[options.pnn].flags & NODE_FLAGS_BANNED));
 
 	ret = control_ipreallocate(ctdb, argc, argv);
 	if (ret != 0) {
@@ -5260,7 +5281,7 @@ int main(int argc, const char *argv[])
 	
 	/* set some defaults */
 	options.maxruntime = 0;
-	options.timelimit = 3;
+	options.timelimit = 10;
 	options.pnn = CTDB_CURRENT_NODE;
 
 	pc = poptGetContext(argv[0], argc, argv, popt_options, POPT_CONTEXT_KEEP_FIRST);


-- 
CTDB repository