[SCM] Samba Shared Repository - branch master updated

Amitay Isaacs amitay at samba.org
Fri Jul 22 17:01:01 UTC 2022


The branch, master has been updated
       via  30c40046ef0 ctdb-build: Add missing dependency on talloc
       via  e831af7b257 ctdb-tests: Work around unreadable file test failure when root
       via  b20ccaa36da ctdb-scripts: Use "git config" as last resort to parse nfs.conf
       via  db37043bc5c ctdb-scripts: Avoid ShellCheck warning SC2295
       via  00f1d6d9476 ctdb-common: Use POSIX if_nameindex() to check interface existence
       via  b686bbb4ac3 replace: Add check for if_nameindex()
       via  c77a4fde7aa ctdb-daemon: Modernise debug in ctdb_add_public_address()
       via  d62fcba7dce ctdb-daemon: Avoid spurious error sending ARPs for released IP
       via  f5a20377347 ctdb-daemon: Modernise debug in ctdb_control_send_arp()
       via  ec5f6425b70 ctdb-protocol: Add separator argument to ctdb_connection_to_buf()
       via  440bd86a992 ctdb-daemon: Drop unused ban_state element from CTDB node structure
       via  9898e7c5558 ctdb-recoverd: Clean up banning culprit code
       via  19fbc2da383 ctdb-recoverd: Add pnn field to banning state structure
       via  0b5dd076046 ctdb-recoverd: Add function node_flags() and use it in elections
      from  e396eb9fbc7 ctdb-scripts: Only run unhealthy call-out when passing threshold

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 30c40046ef0b52da1dee3a65117c20da2a75955b
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 22 11:41:57 2022 +1000

    ctdb-build: Add missing dependency on talloc
    
    The include isn't strictly necessary, since it is included via
    common/reqid.c anyway.  However, it is a useful hint.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Fri Jul 22 17:01:00 UTC 2022 on sn-devel-184

commit e831af7b25760dbbc2a0fc5366b36cd885aac838
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 22 11:05:21 2022 +1000

    ctdb-tests: Work around unreadable file test failure when root
    
    root can read files for which the mode prohibits reading, so this test
    case fails when run as root.  Work around this when running as root.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit b20ccaa36da23c8ee84b117b2e82e98bd2be4fcc
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 21 14:22:25 2022 +1000

    ctdb-scripts: Use "git config" as last resort to parse nfs.conf
    
    Some versions of nfs-utils (e.g. recent CentOS 7) use /etc/nfs.conf
    but do not include the nfsconf utility to extract values from the
    file.  However, git has an excellent conf file parser, so use it as a
    last resort.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit db37043bc5c67e536bcaaf1941cb12ec2e72efc9
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri May 27 23:23:48 2022 +1000

    ctdb-scripts: Avoid ShellCheck warning SC2295
    
    For example:
    
    In /home/martins/samba/samba/ctdb/tools/onnode line 304:
        [ "$nodes" != "${nodes%[ ${nl}]*}" ] && verbose=true
                                 ^---^ SC2295 (info): Expansions inside ${..} need to be quoted separately, otherwise they match as patterns.
    
    Did you mean:
        [ "$nodes" != "${nodes%[ "${nl}"]*}" ] && verbose=true
    
    For more information:
      https://www.shellcheck.net/wiki/SC2295 -- Expansions inside ${..} need to b...
    
    Who knew?  Thanks ShellCheck!
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 00f1d6d94764ba1312500c72fd08e7df3fae064b
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jul 5 12:31:57 2022 +1000

    ctdb-common: Use POSIX if_nameindex() to check interface existence
    
    This works as an unprivileged user, so avoids unnecessary errors when
    running in test mode (and not as root):
    
      2022-02-18T12:21:12.436491+11:00 node.0 ctdbd[6958]: ctdb_sys_check_iface_exists: Failed to open raw socket
      2022-02-18T12:21:12.436534+11:00 node.0 ctdbd[6958]: ctdb_sys_check_iface_exists: Failed to open raw socket
      2022-02-18T12:21:12.436557+11:00 node.0 ctdbd[6958]: ctdb_sys_check_iface_exists: Failed to open raw socket
      2022-02-18T12:21:12.436577+11:00 node.0 ctdbd[6958]: ctdb_sys_check_iface_exists: Failed to open raw socket
    
    The corresponding porting test would now become pointless because it
    would just confirm that "fake" does not exist.  Attempt to make it
    useful by using a less likely name than "fake" and attempting to
    detect the loopback interface.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit b686bbb4ac37296e23e74c1c10145f22b6d29d42
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 21 11:25:37 2022 +1000

    replace: Add check for if_nameindex()
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit c77a4fde7aaa7130b969f5d49ac75abb2acfffd0
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jul 5 12:17:05 2022 +1000

    ctdb-daemon: Modernise debug in ctdb_add_public_address()
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit d62fcba7dce6038c02c12b3531e953e7b808614a
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jun 23 14:30:34 2022 +1000

    ctdb-daemon: Avoid spurious error sending ARPs for released IP
    
    A public IP address can be released in between (and probably before)
    attempts to send ARPs.  One situation when this can occur is when a
    cluster is shutting down: node A shuts down first, public IPs from
    node A are taken over by node B, node B is shutdown.
    
    Notice this when it occurs and cancel further attempts to send ARPs.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit f5a20377347aba18700d010d4201775fc83a0b1b
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jul 5 19:33:15 2022 +1000

    ctdb-daemon: Modernise debug in ctdb_control_send_arp()
    
    For the tickle ACK logging, render the connection in a buffer.  This
    produces more complete information.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit ec5f6425b70672af591df3113962c636d8f65005
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jul 19 11:53:15 2022 +1000

    ctdb-protocol: Add separator argument to ctdb_connection_to_buf()
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 440bd86a9925bd5b97fd5130e3e5a4ac104ee5dd
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jul 29 13:39:03 2020 +1000

    ctdb-daemon: Drop unused ban_state element from CTDB node structure
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 9898e7c5558e47c4666c552ef907a49e231dd2c7
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jul 29 13:30:04 2020 +1000

    ctdb-recoverd: Clean up banning culprit code
    
    Make this fully self-contained in the recovery daemon and avoid
    indexing by PNN.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 19fbc2da383245522a58a222c1bca75d4ad98c8e
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jul 29 12:15:03 2020 +1000

    ctdb-recoverd: Add pnn field to banning state structure
    
    This structure is now standalone, so indexing by PNN can be avoided
    via a subsequent commit.  Index by culprit here to make this commit
    simple.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 0b5dd076046f254bb8d60c1b4377c32a3dc59a10
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jul 29 17:57:53 2020 +1000

    ctdb-recoverd: Add function node_flags() and use it in elections
    
    Indexing a node map by PNN is suboptimal.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 ctdb/common/system.c                               |  38 +++---
 ctdb/config/events/legacy/13.per_ip_routing.script |   2 +-
 ctdb/config/statd-callout                          |   9 +-
 ctdb/include/ctdb_private.h                        |   3 -
 ctdb/protocol/protocol_util.c                      |  20 ++-
 ctdb/protocol/protocol_util.h                      |   7 +-
 ctdb/server/ctdb_recoverd.c                        | 151 ++++++++++++++-------
 ctdb/server/ctdb_takeover.c                        |  55 +++++---
 ctdb/tests/UNIT/cunit/porting_tests_001.sh         |  15 +-
 ctdb/tests/UNIT/cunit/tunable_test_001.sh          |   8 +-
 ctdb/tests/run_tests.sh                            |   6 +-
 ctdb/tests/scripts/integration.bash                |   4 +-
 ctdb/tests/src/porting_tests.c                     |  18 +--
 ctdb/tests/src/reqid_test.c                        |   1 +
 ctdb/tools/onnode                                  |   4 +-
 ctdb/wscript                                       |   2 +-
 lib/replace/wscript                                |   2 +-
 17 files changed, 215 insertions(+), 130 deletions(-)


Changeset truncated at 500 lines:

diff --git a/ctdb/common/system.c b/ctdb/common/system.c
index 650b62bab16..08dc68284fd 100644
--- a/ctdb/common/system.c
+++ b/ctdb/common/system.c
@@ -148,32 +148,36 @@ void ctdb_wait_for_process_to_exit(pid_t pid)
 	}
 }
 
-#ifdef HAVE_AF_PACKET
+#ifdef HAVE_IF_NAMEINDEX
 
 bool ctdb_sys_check_iface_exists(const char *iface)
 {
-	int s;
-	struct ifreq ifr;
+	struct if_nameindex *ifnis, *ifni;
+	bool found = false;
 
-	s = socket(AF_PACKET, SOCK_RAW, 0);
-	if (s == -1){
-		/* We don't know if the interface exists, so assume yes */
-		DBG_ERR("Failed to open raw socket\n");
-		return true;
+	ifnis = if_nameindex();
+	if (ifnis == NULL) {
+		DBG_ERR("Failed to retrieve inteface list\n");
+		return false;
 	}
 
-	strlcpy(ifr.ifr_name, iface, sizeof(ifr.ifr_name));
-	if (ioctl(s, SIOCGIFINDEX, &ifr) < 0 && errno == ENODEV) {
-		DBG_ERR("Interface '%s' not found\n", iface);
-		close(s);
-		return false;
+	for (ifni = ifnis;
+	     ifni->if_index != 0 || ifni->if_name != NULL;
+	     ifni++) {
+		int cmp = strcmp(iface, ifni->if_name);
+		if (cmp == 0) {
+			found = true;
+			goto done;
+		}
 	}
-	close(s);
 
-	return true;
+done:
+	if_freenameindex(ifnis);
+
+	return found;
 }
 
-#else /* HAVE_AF_PACKET */
+#else /* HAVE_IF_NAMEINDEX */
 
 bool ctdb_sys_check_iface_exists(const char *iface)
 {
@@ -181,7 +185,7 @@ bool ctdb_sys_check_iface_exists(const char *iface)
 	return true;
 }
 
-#endif /* HAVE_AF_PACKET */
+#endif /* HAVE_IF_NAMEINDEX */
 
 #ifdef HAVE_PEERCRED
 
diff --git a/ctdb/config/events/legacy/13.per_ip_routing.script b/ctdb/config/events/legacy/13.per_ip_routing.script
index e25647613bb..d7949c6dedb 100755
--- a/ctdb/config/events/legacy/13.per_ip_routing.script
+++ b/ctdb/config/events/legacy/13.per_ip_routing.script
@@ -346,7 +346,7 @@ remove_bogus_routes ()
 	# be done with grep, but let's do it with shell prefix removal
 	# to avoid unnecessary processes.  This falls through if
 	# "@${_i}@" isn't present in $_ips.
-	[ "$_ips" = "${_ips#*@${_i}@}" ] || continue
+	[ "$_ips" = "${_ips#*@"${_i}"@}" ] || continue
 
 	echo "Removing ip rule/routes for unhosted public address $_i"
 	del_routing_for_ip "$_i"
diff --git a/ctdb/config/statd-callout b/ctdb/config/statd-callout
index 83fb92eccf0..38c155e4793 100755
--- a/ctdb/config/statd-callout
+++ b/ctdb/config/statd-callout
@@ -32,8 +32,13 @@ die ()
 load_system_config "nfs" "nfs-common"
 
 # If NFS_HOSTNAME not set then try to pull it out of /etc/nfs.conf
-if [ -z "$NFS_HOSTNAME" ] && type nfsconf >/dev/null 2>&1 ; then
-	NFS_HOSTNAME=$(nfsconf --get statd name)
+if [ -z "$NFS_HOSTNAME" ]; then
+	if type nfsconf >/dev/null 2>&1; then
+		NFS_HOSTNAME=$(nfsconf --get statd name)
+	elif type git >/dev/null 2>&1; then
+		# git to the rescue!
+		NFS_HOSTNAME=$(git config --file=/etc/nfs.conf statd.name)
+	fi
 fi
 
 [ -n "$NFS_HOSTNAME" ] || \
diff --git a/ctdb/include/ctdb_private.h b/ctdb/include/ctdb_private.h
index 3193005db75..3395f077ab9 100644
--- a/ctdb/include/ctdb_private.h
+++ b/ctdb/include/ctdb_private.h
@@ -88,9 +88,6 @@ struct ctdb_node {
 	/* a list of controls pending to this node, so we can time them out quickly
 	   if the node becomes disconnected */
 	struct daemon_control_state *pending_controls;
-
-	/* used by the recovery daemon to track when a node should be banned */
-	struct ctdb_banning_state *ban_state; 
 };
 
 /*
diff --git a/ctdb/protocol/protocol_util.c b/ctdb/protocol/protocol_util.c
index 28631c8de61..fe757658f48 100644
--- a/ctdb/protocol/protocol_util.c
+++ b/ctdb/protocol/protocol_util.c
@@ -497,8 +497,11 @@ bool ctdb_sock_addr_same(const ctdb_sock_addr *addr1,
 	return (ctdb_sock_addr_cmp(addr1, addr2) == 0);
 }
 
-int ctdb_connection_to_buf(char *buf, size_t buflen,
-			   struct ctdb_connection *conn, bool client_first)
+int ctdb_connection_to_buf(char *buf,
+			   size_t buflen,
+			   struct ctdb_connection *conn,
+			   bool client_first,
+			   const char *sep)
 {
 	char server[64], client[64];
 	int ret;
@@ -516,9 +519,9 @@ int ctdb_connection_to_buf(char *buf, size_t buflen,
 	}
 
 	if (! client_first) {
-		ret = snprintf(buf, buflen, "%s %s", server, client);
+		ret = snprintf(buf, buflen, "%s%s%s", server, sep, client);
 	} else {
-		ret = snprintf(buf, buflen, "%s %s", client, server);
+		ret = snprintf(buf, buflen, "%s%s%s", client, sep, server);
 	}
 	if (ret < 0 || (size_t)ret >= buflen) {
 		return ENOSPC;
@@ -540,7 +543,7 @@ char *ctdb_connection_to_string(TALLOC_CTX *mem_ctx,
 		return NULL;
 	}
 
-	ret = ctdb_connection_to_buf(out, len, conn, client_first);
+	ret = ctdb_connection_to_buf(out, len, conn, client_first, " ");
 	if (ret != 0) {
 		talloc_free(out);
 		return NULL;
@@ -666,8 +669,11 @@ char *ctdb_connection_list_to_string(
 		char buf[128];
 		int ret;
 
-		ret = ctdb_connection_to_buf(buf, sizeof(buf),
-					     &conn_list->conn[i], client_first);
+		ret = ctdb_connection_to_buf(buf,
+					     sizeof(buf),
+					     &conn_list->conn[i],
+					     client_first,
+					     " ");
 		if (ret != 0) {
 			talloc_free(out);
 			return NULL;
diff --git a/ctdb/protocol/protocol_util.h b/ctdb/protocol/protocol_util.h
index b01db8e9934..2bdbb0c2ad0 100644
--- a/ctdb/protocol/protocol_util.h
+++ b/ctdb/protocol/protocol_util.h
@@ -55,8 +55,11 @@ bool ctdb_sock_addr_same_ip(const ctdb_sock_addr *addr1,
 bool ctdb_sock_addr_same(const ctdb_sock_addr *addr1,
 			 const ctdb_sock_addr *addr2);
 
-int ctdb_connection_to_buf(char *buf, size_t buflen,
-			   struct ctdb_connection * conn, bool client_first);
+int ctdb_connection_to_buf(char *buf,
+			   size_t buflen,
+			   struct ctdb_connection * conn,
+			   bool client_first,
+			   const char *sep);
 char *ctdb_connection_to_string(TALLOC_CTX *mem_ctx,
 				struct ctdb_connection * conn,
 				bool client_first);
diff --git a/ctdb/server/ctdb_recoverd.c b/ctdb/server/ctdb_recoverd.c
index c293aa7f037..bf3a66b0aaf 100644
--- a/ctdb/server/ctdb_recoverd.c
+++ b/ctdb/server/ctdb_recoverd.c
@@ -237,6 +237,7 @@ static int ctdb_op_disable(struct ctdb_op_state *state,
 }
 
 struct ctdb_banning_state {
+	uint32_t pnn;
 	uint32_t count;
 	struct timeval last_reported_time;
 };
@@ -253,6 +254,7 @@ struct ctdb_recoverd {
 	struct tevent_timer *leader_broadcast_timeout_te;
 	uint32_t pnn;
 	uint32_t last_culprit_node;
+	struct ctdb_banning_state *banning_state;
 	struct ctdb_node_map_old *nodemap;
 	struct timeval priority_time;
 	bool need_takeover_run;
@@ -290,6 +292,23 @@ static bool this_node_can_be_leader(struct ctdb_recoverd *rec)
 		(rec->ctdb->capabilities & CTDB_CAP_RECMASTER) != 0;
 }
 
+static bool node_flags(struct ctdb_recoverd *rec, uint32_t pnn, uint32_t *flags)
+{
+	size_t i;
+
+	for (i = 0; i < rec->nodemap->num; i++) {
+		struct ctdb_node_and_flags *node = &rec->nodemap->nodes[i];
+		if (node->pnn == pnn) {
+			if (flags != NULL) {
+				*flags = node->flags;
+			}
+			return true;
+		}
+	}
+
+	return false;
+}
+
 /*
   ban a node for a period of time
  */
@@ -324,33 +343,75 @@ enum monitor_result { MONITOR_OK, MONITOR_RECOVERY_NEEDED, MONITOR_ELECTION_NEED
 /*
   remember the trouble maker
  */
-static void ctdb_set_culprit_count(struct ctdb_recoverd *rec, uint32_t culprit, uint32_t count)
-{
-	struct ctdb_context *ctdb = talloc_get_type(rec->ctdb, struct ctdb_context);
-	struct ctdb_banning_state *ban_state;
+static void ctdb_set_culprit_count(struct ctdb_recoverd *rec,
+				   uint32_t culprit,
+				   uint32_t count)
+{
+	struct ctdb_context *ctdb = talloc_get_type_abort(
+		rec->ctdb, struct ctdb_context);
+	struct ctdb_banning_state *ban_state = NULL;
+	size_t len;
+	bool ok;
 
-	if (culprit > ctdb->num_nodes) {
-		DEBUG(DEBUG_ERR,("Trying to set culprit %d but num_nodes is %d\n", culprit, ctdb->num_nodes));
+	ok = node_flags(rec, culprit, NULL);
+	if (!ok) {
+		DBG_WARNING("Unknown culprit node %"PRIu32"\n", culprit);
 		return;
 	}
 
 	/* If we are banned or stopped, do not set other nodes as culprits */
 	if (rec->node_flags & NODE_FLAGS_INACTIVE) {
-		DEBUG(DEBUG_NOTICE, ("This node is INACTIVE, cannot set culprit node %d\n", culprit));
+		D_WARNING("This node is INACTIVE, cannot set culprit node %d\n",
+			  culprit);
 		return;
 	}
 
-	if (ctdb->nodes[culprit]->ban_state == NULL) {
-		ctdb->nodes[culprit]->ban_state = talloc_zero(ctdb->nodes[culprit], struct ctdb_banning_state);
-		CTDB_NO_MEMORY_VOID(ctdb, ctdb->nodes[culprit]->ban_state);
+	if (rec->banning_state == NULL) {
+		len = 0;
+	} else {
+		size_t i;
+
+		len = talloc_array_length(rec->banning_state);
 
-		
+		for (i = 0 ; i < len; i++) {
+			if (rec->banning_state[i].pnn == culprit) {
+				ban_state= &rec->banning_state[i];
+				break;
+			}
+		}
 	}
-	ban_state = ctdb->nodes[culprit]->ban_state;
-	if (timeval_elapsed(&ban_state->last_reported_time) > ctdb->tunable.recovery_grace_period) {
-		/* this was the first time in a long while this node
-		   misbehaved so we will forgive any old transgressions.
-		*/
+
+	/* Not found, so extend (or allocate new) array */
+	if (ban_state == NULL) {
+		struct ctdb_banning_state *t;
+
+		len += 1;
+		/*
+		 * talloc_realloc() handles the corner case where
+		 * rec->banning_state is NULL
+		 */
+		t = talloc_realloc(rec,
+				   rec->banning_state,
+				   struct ctdb_banning_state,
+				   len);
+		if (t == NULL) {
+			DBG_WARNING("Memory allocation errror");
+			return;
+		}
+		rec->banning_state = t;
+
+		/* New element is always at the end - initialise it... */
+		ban_state = &rec->banning_state[len - 1];
+		*ban_state = (struct ctdb_banning_state) {
+			.pnn = culprit,
+			.count = 0,
+		};
+	} else if (ban_state->count > 0 &&
+		   timeval_elapsed(&ban_state->last_reported_time) >
+		   ctdb->tunable.recovery_grace_period) {
+		/*
+		 * Forgive old transgressions beyond the tunable time-limit
+		 */
 		ban_state->count = 0;
 	}
 
@@ -359,6 +420,12 @@ static void ctdb_set_culprit_count(struct ctdb_recoverd *rec, uint32_t culprit,
 	rec->last_culprit_node = culprit;
 }
 
+static void ban_counts_reset(struct ctdb_recoverd *rec)
+{
+	D_NOTICE("Resetting ban count to 0 for all nodes\n");
+	TALLOC_FREE(rec->banning_state);
+}
+
 /*
   remember the trouble maker
  */
@@ -931,28 +998,26 @@ static void cluster_lock_release(struct ctdb_recoverd *rec)
 
 static void ban_misbehaving_nodes(struct ctdb_recoverd *rec, bool *self_ban)
 {
-	struct ctdb_context *ctdb = rec->ctdb;
-	unsigned int i;
-	struct ctdb_banning_state *ban_state;
+	size_t len = talloc_array_length(rec->banning_state);
+	size_t i;
+
 
 	*self_ban = false;
-	for (i=0; i<ctdb->num_nodes; i++) {
-		if (ctdb->nodes[i]->ban_state == NULL) {
-			continue;
-		}
-		ban_state = (struct ctdb_banning_state *)ctdb->nodes[i]->ban_state;
-		if (ban_state->count < 2*ctdb->num_nodes) {
+	for (i = 0; i < len; i++) {
+		struct ctdb_banning_state *ban_state = &rec->banning_state[i];
+
+		if (ban_state->count < 2 * rec->nodemap->num) {
 			continue;
 		}
 
 		D_NOTICE("Node %u reached %u banning credits\n",
-			 ctdb->nodes[i]->pnn,
+			 ban_state->pnn,
 			 ban_state->count);
-		ctdb_ban_node(rec, ctdb->nodes[i]->pnn);
+		ctdb_ban_node(rec, ban_state->pnn);
 		ban_state->count = 0;
 
 		/* Banning ourself? */
-		if (ctdb->nodes[i]->pnn == rec->pnn) {
+		if (ban_state->pnn == rec->pnn) {
 			*self_ban = true;
 		}
 	}
@@ -1343,25 +1408,10 @@ static int do_recovery(struct ctdb_recoverd *rec, TALLOC_CTX *mem_ctx)
 	rec->need_recovery = false;
 	ctdb_op_end(rec->recovery);
 
-	/* we managed to complete a full recovery, make sure to forgive
-	   any past sins by the nodes that could now participate in the
-	   recovery.
-	*/
-	DEBUG(DEBUG_ERR,("Resetting ban count to 0 for all nodes\n"));
-	for (i=0;i<nodemap->num;i++) {
-		struct ctdb_banning_state *ban_state;
-
-		if (nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED) {
-			continue;
-		}
-
-		ban_state = (struct ctdb_banning_state *)ctdb->nodes[nodemap->nodes[i].pnn]->ban_state;
-		if (ban_state == NULL) {
-			continue;
-		}
-
-		ban_state->count = 0;
-	}
+	/*
+	 * Completed a full recovery so forgive any past transgressions
+	 */
+	ban_counts_reset(rec);
 
 	/* We just finished a recovery successfully.
 	   We now wait for rerecovery_timeout before we allow
@@ -1398,6 +1448,7 @@ static void ctdb_election_data(struct ctdb_recoverd *rec, struct election_messag
 	int ret;
 	struct ctdb_node_map_old *nodemap;
 	struct ctdb_context *ctdb = rec->ctdb;
+	bool ok;
 
 	ZERO_STRUCTP(em);
 
@@ -1410,7 +1461,11 @@ static void ctdb_election_data(struct ctdb_recoverd *rec, struct election_messag
 		return;
 	}
 
-	rec->node_flags = nodemap->nodes[rec->pnn].flags;
+	ok = node_flags(rec, rec->pnn, &rec->node_flags);
+	if (!ok) {
+		DBG_ERR("Unable to get node flags for this node\n");
+		return;
+	}
 	em->node_flags = rec->node_flags;
 
 	for (i=0;i<nodemap->num;i++) {
diff --git a/ctdb/server/ctdb_takeover.c b/ctdb/server/ctdb_takeover.c
index c1e4f683784..0fb8076ad55 100644
--- a/ctdb/server/ctdb_takeover.c
+++ b/ctdb/server/ctdb_takeover.c
@@ -373,8 +373,17 @@ static void ctdb_control_send_arp(struct tevent_context *ev,
 							struct ctdb_takeover_arp);
 	int ret;
 	struct ctdb_tcp_array *tcparray;
-	const char *iface = ctdb_vnn_iface_string(arp->vnn);
+	const char *iface;
 
+	/* IP address might have been released between sends */
+	if (arp->vnn->iface == NULL) {
+		DBG_INFO("Cancelling ARP send for released IP %s\n",
+			 ctdb_addr_to_str(&arp->vnn->public_address));
+		talloc_free(arp);
+		return;
+	}
+
+	iface = ctdb_vnn_iface_string(arp->vnn);
 	ret = ctdb_sys_send_arp(&arp->addr, iface);
 	if (ret != 0) {
 		DBG_ERR("Failed to send ARP on interface %s: %s\n",
@@ -387,19 +396,25 @@ static void ctdb_control_send_arp(struct tevent_context *ev,
 
 		for (i=0;i<tcparray->num;i++) {
 			struct ctdb_connection *tcon;
+			char buf[128];
 
 			tcon = &tcparray->connections[i];
-			DEBUG(DEBUG_INFO,("sending tcp tickle ack for %u->%s:%u\n",
-				(unsigned)ntohs(tcon->dst.ip.sin_port),
-				ctdb_addr_to_str(&tcon->src),
-				(unsigned)ntohs(tcon->src.ip.sin_port)));
+			ret = ctdb_connection_to_buf(buf,
+						     sizeof(buf),
+						     tcon,
+						     true,
+						     " -> ");
+			if (ret != 0) {
+				strlcpy(buf, "UNKNOWN", sizeof(buf));
+			}
+			D_INFO("Send TCP tickle ACK: %s\n", buf);
 			ret = ctdb_sys_send_tcp(
 				&tcon->src,
 				&tcon->dst,
 				0, 0, 0);
 			if (ret != 0) {
-				DEBUG(DEBUG_CRIT,(__location__ " Failed to send tcp tickle ack for %s\n",
-					ctdb_addr_to_str(&tcon->src)));
+				DBG_ERR("Failed to send TCP tickle ACK: %s\n",
+					buf);
 			}
 		}
 	}
@@ -1055,9 +1070,8 @@ static int ctdb_add_public_address(struct ctdb_context *ctdb,
 	/* Verify that we don't have an entry for this IP yet */
 	for (vnn = ctdb->vnn; vnn != NULL; vnn = vnn->next) {
 		if (ctdb_same_sockaddr(addr, &vnn->public_address)) {
-			DEBUG(DEBUG_ERR,
-			      ("Duplicate public IP address '%s'\n",
-			       ctdb_addr_to_str(addr)));
+			D_ERR("Duplicate public IP address '%s'\n",
+			      ctdb_addr_to_str(addr));
 			return -1;
 		}
 	}
@@ -1065,39 +1079,40 @@ static int ctdb_add_public_address(struct ctdb_context *ctdb,
 	/* Create a new VNN structure for this IP address */
 	vnn = talloc_zero(ctdb, struct ctdb_vnn);
 	if (vnn == NULL) {
-		DEBUG(DEBUG_ERR, (__location__ " out of memory\n"));
+		DBG_ERR("Memory allocation error\n");
 		return -1;
 	}
 	tmp = talloc_strdup(vnn, ifaces);
 	if (tmp == NULL) {


-- 
Samba Shared Repository



More information about the samba-cvs mailing list