[SCM] Samba Shared Repository - branch master updated

Amitay Isaacs amitay at samba.org
Tue Aug 18 06:25:02 UTC 2020


The branch, master has been updated
       via  8bb6a6607da ctdb-recoverd: Broadcast takeover run message when verifying IPs
       via  4aa8e72d60e ctdb-recoverd: Rename update_local_flags() -> update_flags()
       via  702c7c4934e ctdb-recoverd: Change update_local_flags() to use already retrieved nodemaps
       via  910a0b3b747 ctdb-recoverd: Get remote nodemaps earlier
       via  d50919b0cb2 ctdb-recoverd: Do not fetch the nodemap from the recovery master
       via  762d1d8a960 ctdb-recoverd: Change get_remote_nodemaps() to use connected nodes
       via  368c83bfe3b ctdb-recoverd: Fix node_pnn check and assignment of nodemap into array
       via  10ce0dbf1c1 ctdb-recoverd: Add fail callback to assign banning credits
       via  a079ee31690 ctdb-recoverd: Add an intermediate state struct for nodemap fetching
       via  2eaa0af6160 ctdb-recoverd: Move memory allocation into get_remote_nodemaps()
       via  3324dd272c7 ctdb-recoverd: Change signature of get_remote_nodemaps()
       via  d2d90f25021 ctdb-recoverd: Fix a local memory leak
       via  52f520d39cd ctdb-recoverd: Basic cleanups for get_remote_nodemaps()
      from  20606fd0a4c WHATSNEW: list deprecated parameters

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 8bb6a6607da22903be2e8f5ed2360d01a08555e9
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jul 29 07:02:45 2020 +1000

    ctdb-recoverd: Broadcast takeover run message when verifying IPs
    
    This makes it consistent with the monitoring code.  If the master has
    changed then this means the master will always get the message.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Tue Aug 18 06:24:11 UTC 2020 on sn-devel-184

commit 4aa8e72d60e92951b35190d2ffcfdb1bfb756609
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jan 24 10:21:37 2018 +1100

    ctdb-recoverd: Rename update_local_flags() -> update_flags()
    
    This also updates remote flags so the name is misleading.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 702c7c4934e79a9161fdc59df70df30ae492d89f
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jan 18 20:35:55 2018 +1100

    ctdb-recoverd: Change update_local_flags() to use already retrieved nodemaps
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 910a0b3b747a987ba69b6a0b6256e964b7d85dfe
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jun 14 03:51:01 2019 +1000

    ctdb-recoverd: Get remote nodemaps earlier
    
    update_local_flags() will be changed to use these nodemaps.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit d50919b0cb28f299c9b6985271b29d4f27c5f619
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jun 14 00:23:22 2019 +1000

    ctdb-recoverd: Do not fetch the nodemap from the recovery master
    
    The nodemap has already been fetched from the local node and is
    actually passed to this function.  Care must be taken to avoid
    referencing the "remote" nodemap for the recovery master.  It also
    isn't useful to do so, since it would be the same nodemap.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 762d1d8a9605f97973a2c1176de5d29fcc61d15a
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jan 18 20:02:42 2018 +1100

    ctdb-recoverd: Change get_remote_nodemaps() to use connected nodes
    
    The plan here is to use the nodemaps retrieved by get_remote_nodes()
    in update_local_flags().  This will improve efficiency, since
    get_remote_nodes() fetches flags from nodes in parallel.  It also
    means that get_remote_nodes() can be used exactly once early on in
    main_loop() to retrieve remote nodemaps.  Retrieving nodemaps multiple
    times is unnecessary and racy - a single monitoring iteration should
    not fetch flags multiple times and compare them.
    
    This introduces a temporary behaviour change but it will be of no
    consequence when the above changes are made.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 368c83bfe3bbfff568d14f65e7b1ffa41d5349ac
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 30 11:57:51 2020 +1000

    ctdb-recoverd: Fix node_pnn check and assignment of nodemap into array
    
    This array is indexed by the same index as nodemap, not the PNN.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 10ce0dbf1c11eaaab7b28b6bbd014235a36d1962
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jan 18 19:58:15 2018 +1100

    ctdb-recoverd: Add fail callback to assign banning credits
    
    Also drop error handling in main_loop() that is replaced by this
    change.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit a079ee31690cf7110f46b41989ffcfb83b7626d6
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jan 18 19:52:22 2018 +1100

    ctdb-recoverd: Add an intermediate state struct for nodemap fetching
    
    This will allow an error callback to be added.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 2eaa0af6160588b6e3364b181d0976477d12b51b
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jan 18 16:31:39 2018 +1100

    ctdb-recoverd: Move memory allocation into get_remote_nodemaps()
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 3324dd272c7dafa92cd9c3fd0af8f50084bcdaaa
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jan 18 16:41:19 2018 +1100

    ctdb-recoverd: Change signature of get_remote_nodemaps()
    
    Change 1st argument to a rec context, since this will be needed later.
    Drop the nodemap argument and access it via rec->nodemap instead.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit d2d90f250214582d7124b8137aa2cf5032b2f285
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Aug 17 20:27:18 2020 +1000

    ctdb-recoverd: Fix a local memory leak
    
    The memory is allocated off the memory context used by the current
    iteration of main loop.  It is freed when main loop completes the fix
    doesn't require backporting to stable branches.  However, it is sloppy
    so it is worth fixing.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 52f520d39cd92e1cf2413fd7e0dd362debd6f463
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jan 18 16:19:36 2018 +1100

    ctdb-recoverd: Basic cleanups for get_remote_nodemaps()
    
    Don't log an error on failure - let the caller can do this.  Apart
    from this: fix up coding style and modernise the remaining error
    message.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 ctdb/server/ctdb_recoverd.c | 165 ++++++++++++++++++++++++++++----------------
 1 file changed, 105 insertions(+), 60 deletions(-)


Changeset truncated at 500 lines:

diff --git a/ctdb/server/ctdb_recoverd.c b/ctdb/server/ctdb_recoverd.c
index 3f72619127a..aeb23276fe7 100644
--- a/ctdb/server/ctdb_recoverd.c
+++ b/ctdb/server/ctdb_recoverd.c
@@ -537,18 +537,19 @@ static void ctdb_wait_election(struct ctdb_recoverd *rec)
 }
 
 /*
-  Update our local flags from all remote connected nodes. 
-  This is only run when we are or we belive we are the recovery master
+ * Update local flags from all remote connected nodes and push out
+ * flags changes to all nodes.  This is only run by the recovery
+ * master.
  */
-static int update_local_flags(struct ctdb_recoverd *rec, struct ctdb_node_map_old *nodemap)
+static int update_flags(struct ctdb_recoverd *rec,
+			struct ctdb_node_map_old *nodemap,
+			struct ctdb_node_map_old **remote_nodemaps)
 {
 	unsigned int j;
 	struct ctdb_context *ctdb = rec->ctdb;
 	TALLOC_CTX *mem_ctx = talloc_new(ctdb);
 
-	/* get the nodemap for all active remote nodes and verify
-	   they are the same as for this node
-	 */
+	/* Check flags from remote nodes */
 	for (j=0; j<nodemap->num; j++) {
 		struct ctdb_node_map_old *remote_nodemap=NULL;
 		uint32_t local_flags = nodemap->nodes[j].flags;
@@ -562,15 +563,7 @@ static int update_local_flags(struct ctdb_recoverd *rec, struct ctdb_node_map_ol
 			continue;
 		}
 
-		ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn, 
-					   mem_ctx, &remote_nodemap);
-		if (ret != 0) {
-			DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from remote node %u\n", 
-				  nodemap->nodes[j].pnn));
-			ctdb_set_culprit(rec, nodemap->nodes[j].pnn);
-			talloc_free(mem_ctx);
-			return -1;
-		}
+		remote_nodemap = remote_nodemaps[j];
 		remote_flags = remote_nodemap->nodes[j].flags;
 
 		if (local_flags != remote_flags) {
@@ -595,7 +588,6 @@ static int update_local_flags(struct ctdb_recoverd *rec, struct ctdb_node_map_ol
 				 local_flags);
 			nodemap->nodes[j].flags = remote_flags;
 		}
-		talloc_free(remote_nodemap);
 	}
 	talloc_free(mem_ctx);
 	return 0;
@@ -2212,10 +2204,12 @@ done:
 		data.dptr = (uint8_t *)&rd;
 		data.dsize = sizeof(rd);
 
-		ret = ctdb_client_send_message(ctdb, rec->recmaster, CTDB_SRVID_TAKEOVER_RUN, data);
+		ret = ctdb_client_send_message(ctdb,
+					       CTDB_BROADCAST_CONNECTED,
+					       CTDB_SRVID_TAKEOVER_RUN,
+					       data);
 		if (ret != 0) {
-			DEBUG(DEBUG_ERR,
-			      ("Failed to send takeover run request\n"));
+			D_ERR("Failed to send takeover run request\n");
 		}
 	}
 	talloc_free(mem_ctx);
@@ -2223,37 +2217,94 @@ done:
 }
 
 
-static void async_getnodemap_callback(struct ctdb_context *ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void *callback_data)
+struct remote_nodemaps_state {
+	struct ctdb_node_map_old **remote_nodemaps;
+	struct ctdb_recoverd *rec;
+};
+
+static void async_getnodemap_callback(struct ctdb_context *ctdb,
+				      uint32_t node_pnn,
+				      int32_t res,
+				      TDB_DATA outdata,
+				      void *callback_data)
 {
-	struct ctdb_node_map_old **remote_nodemaps = callback_data;
+	struct remote_nodemaps_state *state =
+		(struct remote_nodemaps_state *)callback_data;
+	struct ctdb_node_map_old **remote_nodemaps = state->remote_nodemaps;
+	struct ctdb_node_map_old *nodemap = state->rec->nodemap;
+	size_t i;
+
+	for (i = 0; i < nodemap->num; i++) {
+		if (nodemap->nodes[i].pnn == node_pnn) {
+			break;
+		}
+	}
 
-	if (node_pnn >= ctdb->num_nodes) {
-		DEBUG(DEBUG_ERR,(__location__ " pnn from invalid node\n"));
+	if (i >= nodemap->num) {
+		DBG_ERR("Invalid PNN %"PRIu32"\n", node_pnn);
 		return;
 	}
 
-	remote_nodemaps[node_pnn] = (struct ctdb_node_map_old *)talloc_steal(remote_nodemaps, outdata.dptr);
+	remote_nodemaps[i] = (struct ctdb_node_map_old *)talloc_steal(
+					remote_nodemaps, outdata.dptr);
+
+}
+
+static void async_getnodemap_error(struct ctdb_context *ctdb,
+				   uint32_t node_pnn,
+				   int32_t res,
+				   TDB_DATA outdata,
+				   void *callback_data)
+{
+	struct remote_nodemaps_state *state =
+		(struct remote_nodemaps_state *)callback_data;
+	struct ctdb_recoverd *rec = state->rec;
 
+	DBG_ERR("Failed to retrieve nodemap from node %u\n", node_pnn);
+	ctdb_set_culprit(rec, node_pnn);
 }
 
-static int get_remote_nodemaps(struct ctdb_context *ctdb, TALLOC_CTX *mem_ctx,
-	struct ctdb_node_map_old *nodemap,
-	struct ctdb_node_map_old **remote_nodemaps)
+static int get_remote_nodemaps(struct ctdb_recoverd *rec,
+			       TALLOC_CTX *mem_ctx,
+			       struct ctdb_node_map_old ***remote_nodemaps)
 {
+	struct ctdb_context *ctdb = rec->ctdb;
+	struct ctdb_node_map_old **t;
 	uint32_t *nodes;
+	struct remote_nodemaps_state state;
+	int ret;
 
-	nodes = list_of_active_nodes(ctdb, nodemap, mem_ctx, true);
-	if (ctdb_client_async_control(ctdb, CTDB_CONTROL_GET_NODEMAP,
-					nodes, 0,
-					CONTROL_TIMEOUT(), false, tdb_null,
+	t = talloc_zero_array(mem_ctx,
+			      struct ctdb_node_map_old *,
+			      rec->nodemap->num);
+	if (t == NULL) {
+		DBG_ERR("Memory allocation error\n");
+		return -1;
+	}
+
+	nodes = list_of_connected_nodes(ctdb, rec->nodemap, mem_ctx, false);
+
+	state.remote_nodemaps = t;
+	state.rec = rec;
+
+	ret = ctdb_client_async_control(ctdb,
+					CTDB_CONTROL_GET_NODEMAP,
+					nodes,
+					0,
+					CONTROL_TIMEOUT(),
+					false,
+					tdb_null,
 					async_getnodemap_callback,
-					NULL,
-					remote_nodemaps) != 0) {
-		DEBUG(DEBUG_ERR, (__location__ " Unable to pull all remote nodemaps\n"));
+					async_getnodemap_error,
+					&state);
+	talloc_free(nodes);
 
-		return -1;
+	if (ret != 0) {
+		talloc_free(t);
+		return ret;
 	}
 
+	*remote_nodemaps = t;
 	return 0;
 }
 
@@ -2498,10 +2549,17 @@ static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec,
 	}
 
 
-	/* ensure our local copies of flags are right */
-	ret = update_local_flags(rec, nodemap);
+	/* Get the nodemaps for all connected remote nodes */
+	ret = get_remote_nodemaps(rec, mem_ctx, &remote_nodemaps);
 	if (ret != 0) {
-		DEBUG(DEBUG_ERR,("Unable to update local flags\n"));
+		DBG_ERR("Failed to read remote nodemaps\n");
+		return;
+	}
+
+	/* Ensure our local and remote flags are correct */
+	ret = update_flags(rec, nodemap, remote_nodemaps);
+	if (ret != 0) {
+		D_ERR("Unable to update flags\n");
 		return;
 	}
 
@@ -2574,33 +2632,14 @@ static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec,
 		goto takeover_run_checks;
 	}
 
-	/* get the nodemap for all active remote nodes
-	 */
-	remote_nodemaps = talloc_array(mem_ctx, struct ctdb_node_map_old *, nodemap->num);
-	if (remote_nodemaps == NULL) {
-		DEBUG(DEBUG_ERR, (__location__ " failed to allocate remote nodemap array\n"));
-		return;
-	}
-	for(i=0; i<nodemap->num; i++) {
-		remote_nodemaps[i] = NULL;
-	}
-	if (get_remote_nodemaps(ctdb, mem_ctx, nodemap, remote_nodemaps) != 0) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to read remote nodemaps\n"));
-		return;
-	} 
-
 	/* verify that all other nodes have the same nodemap as we have
 	*/
 	for (j=0; j<nodemap->num; j++) {
-		if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {
+		if (nodemap->nodes[j].pnn == ctdb->pnn) {
 			continue;
 		}
-
-		if (remote_nodemaps[j] == NULL) {
-			DEBUG(DEBUG_ERR,(__location__ " Did not get a remote nodemap for node %d, restarting monitoring\n", j));
-			ctdb_set_culprit(rec, j);
-
-			return;
+		if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {
+			continue;
 		}
 
  		/* if the nodes disagree on how many nodes there are
@@ -2635,6 +2674,9 @@ static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec,
 	 * up-to-date information for all the nodes.
 	 */
 	for (j=0; j<nodemap->num; j++) {
+		if (nodemap->nodes[j].pnn == ctdb->pnn) {
+			continue;
+		}
 		if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {
 			continue;
 		}
@@ -2642,6 +2684,9 @@ static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec,
 	}
 
 	for (j=0; j<nodemap->num; j++) {
+		if (nodemap->nodes[j].pnn == ctdb->pnn) {
+			continue;
+		}
 		if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {
 			continue;
 		}


-- 
Samba Shared Repository



More information about the samba-cvs mailing list