[SCM] Samba Shared Repository - branch master updated

Amitay Isaacs amitay at samba.org
Thu Sep 9 02:39:01 UTC 2021


The branch, master has been updated
       via  9e7d2d9794a ctdb-daemon: Don't mark a node as unhealthy when connecting to it
       via  7f697b1938e ctdb-daemon: Ignore flag changes for disconnected nodes
       via  ae10a8a4b70 ctdb-daemon: Simplify ctdb_control_modflags()
       via  916c5ee131d ctdb-recoverd: Mark CTDB_SRVID_SET_NODE_FLAGS obsolete
       via  e75256767ff ctdb-daemon: Don't bother sending CTDB_SRVID_SET_NODE_FLAGS
       via  0132bd5a223 ctdb-daemon: Modernise remaining debug macro in this function
       via  b6d25d079e3 ctdb-daemon: Update logging for flag changes
       via  eec44e28625 ctdb-daemon: Correct the condition for logging unchanged flags
       via  5914054698d ctdb-tools: Use disable and enable controls in tool
       via  6fe6a54e7f3 ctdb-client: Add client code for disable/enable controls
       via  15a6489c288 ctdb_daemon: Implement controls DISABLE_NODE/ENABLE_NODE
       via  60c1ef14653 ctdb-daemon: Start as disabled means PERMANENTLY_DISABLED
       via  1ac7bc7532b ctdb-daemon: Factor out a function to get node structure from PNN
       via  e0a7b5a9e86 ctdb-daemon: Add a helper variable
       via  6845dca87e6 ctdb-protocol: Add marshalling for controls DISABLE_NODE/ENABLE_NODE
       via  49dc5d8cd2d ctdb-protocol: Add new controls to disable and enable nodes
       via  8305f6a7f13 ctdb-recoverd: Push flags for a node if any remote node disagrees
       via  620d0787142 ctdb-recoverd: Update the local node map before pushing out flags
       via  82a075d4d73 ctdb-recoverd: Add a helper variable
      from  4366c3bb71f gitlab-ci: run samba-fuzz autobuild target on Ubuntu 20.04-based image

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 9e7d2d9794af7251c42cb22f23ee9f86c6ea05c1
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 17:25:32 2021 +1000

    ctdb-daemon: Don't mark a node as unhealthy when connecting to it
    
    Remote nodes are already initialised as UNHEALTHY when the node list
    is initialised at startup (ctdb_load_nodes_file() calls
    convert_node_map_to_list()) and when disconnected (ctdb_node_dead()).
    So, drop this code.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Thu Sep  9 02:38:34 UTC 2021 on sn-devel-184

commit 7f697b1938efb3972f03f25546bf807d5af9a26c
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jul 27 15:50:54 2021 +1000

    ctdb-daemon: Ignore flag changes for disconnected nodes
    
    If this node is not connected to a node then we shouldn't know
    anything about it.  The state will be pushed later by the recovery
    master.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>

commit ae10a8a4b70e53ea3be6257d1f86f2d9a56aa62a
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 11:11:11 2021 +1000

    ctdb-daemon: Simplify ctdb_control_modflags()
    
    Now that there are separate disable/enable controls used by the ctdb
    tool this control can ignore any flag updates for the current nodes.
    These only come from the recovery master, which depends on being able
    to fetch flags for all nodes.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 916c5ee131dc5c7f1d9c3540147d1f915c8302ad
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jan 17 19:04:34 2018 +1100

    ctdb-recoverd: Mark CTDB_SRVID_SET_NODE_FLAGS obsolete
    
    CTDB_SRVID_SET_NODE_FLAGS is no longer sent so drop monitor_handler()
    and replace with srvid_not_implemented().  Mark the SRVID obsolete in
    its comment.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit e75256767fffc6a7ac0b97e58737a39c63c8b187
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 11:32:20 2021 +1000

    ctdb-daemon: Don't bother sending CTDB_SRVID_SET_NODE_FLAGS
    
    The code that handles this message is
    ctdb_recoverd.c:monitor_handler().  Although it appears to do
    something potentially useful, it only logs the flags changes.  All
    changes made are to local structures - there are no actual
    side-effects.
    
    It used to trigger a takeover run when the DISABLED flag changed.
    This was dropped back in commit
    662f06de9fdce7b1bc1772a4fbe43de271564917.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 0132bd5a2233193256af434a37506f86ed62c075
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 11:34:49 2021 +1000

    ctdb-daemon: Modernise remaining debug macro in this function
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit b6d25d079e30919457cacbfbbfd670bf88295a9c
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 11:29:38 2021 +1000

    ctdb-daemon: Update logging for flag changes
    
    When flags change, promote the message to NOTICE level and switch the
    message to the style that is currently generated by
    ctdb-recoverd.c:monitor_handler().  This will allow monitor_handler()
    to go away in future.
    
    Drop logging when flags do not change.  The recovery master now logs
    when it pushes flags for a node, so the lack of a corresponding
    "changed flags" message here indicates that no update was required.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit eec44e286250a6ee7b5c42d85d632bdc300a409f
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 15:13:49 2021 +1000

    ctdb-daemon: Correct the condition for logging unchanged flags
    
    Don't trust the old flags from the recovery master.
    
    Surrounding code will change in future comments, including the use of
    old-style debug macros, so just make this change clear.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 5914054698dab934fd4db5efb9d211b2fdc40bb9
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:37:19 2021 +1000

    ctdb-tools: Use disable and enable controls in tool
    
    Note that there a change from broadcast to a directed control here.
    This is OK because the recovery master will push flags if any nodes
    disagree with the canonical flags fetched from a node.
    
    Static function ctdb_ctrl_modflags() is no longer used to drop it.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 6fe6a54e7f32e650be6ab36041159081dbde5165
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:32:12 2021 +1000

    ctdb-client: Add client code for disable/enable controls
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 15a6489c288b3adb635a728cb2049621ab1a07f7
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:12:59 2021 +1000

    ctdb_daemon: Implement controls DISABLE_NODE/ENABLE_NODE
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 60c1ef146538d90f97b7823459f7548ca5fa6dd3
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:02:28 2021 +1000

    ctdb-daemon: Start as disabled means PERMANENTLY_DISABLED
    
    DISABLED is UNHEALTHY | PERMANENTLY_DISABLED, which is not what is
    intended here.  Luckily, it doesn't do any harm because nodes are
    marked unhealthy at startup anyway.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 1ac7bc7532b2fad791d0e53effa7c64cdc73c4eb
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:01:33 2021 +1000

    ctdb-daemon: Factor out a function to get node structure from PNN
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit e0a7b5a9e866452b1faaed86a105492fe7b237e2
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jul 28 10:27:42 2021 +1000

    ctdb-daemon: Add a helper variable
    
    Simplifies a subsequent change.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 6845dca87e6ffc5e449fb78d23eb9c7a22698b80
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 12:10:12 2021 +1000

    ctdb-protocol: Add marshalling for controls DISABLE_NODE/ENABLE_NODE
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 49dc5d8cd2d3767044ac69cbd25c8210d11cadf7
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 17:28:20 2021 +1000

    ctdb-protocol: Add new controls to disable and enable nodes
    
    These are CTDB_CONTROL_DISABLE_NODE and CTDB_CONTROL_ENABLE_NODE.
    
    For consistency these match CTDB_CONTROL_STOP_NODE and
    CTDB_CONTROL_CONTINUE_NODE.  It would be possible to add a single
    control but it would need to take data.
    
    The aim is to finally fix races in flag handling.  Previous fixes have
    improved the situation but they have only narrowed the race window.
    The problem is that the recovery daemon on the master node pushes
    flags to nodes the same way that disable and enable are implemented.
    So the following sequence is still racy:
    
    1. Node A is disabled
    2. Recovery master pulls flags from all nodes including A
    3. Node A is enabled
    4. Recovery master notices A is disabled and pushes a flag update to
       all nodes including node A
    5. Node A is erroneously marked disabled
    
    Node A can not tell if the MODIFY_FLAGS control is from a "ctdb
    disable" command or a flag update from the recovery master.
    
    The solution is to use a different mechanism for disable/enable and
    for a node to ignore MODIFY_FLAGS controls for their own flags.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 8305f6a7f132f03b0bbdb26692b7491fd3f6c24f
Author: Martin Schwenke <martin at meltin.net>
Date:   Sun Jul 11 22:17:08 2021 +1000

    ctdb-recoverd: Push flags for a node if any remote node disagrees
    
    This will usually happen if flags on the node in question change, so
    keeping the code simple and pushing to all nodes won't hurt.  When all
    nodes come up there might be differences in connected nodes, causing
    such "fix ups".  Receiving nodes will ignore no-op pushes.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 620d07871420cdbfa055c1ace75ec1ac4c32721d
Author: Martin Schwenke <martin at meltin.net>
Date:   Sun Jul 11 21:28:43 2021 +1000

    ctdb-recoverd: Update the local node map before pushing out flags
    
    The resulting code structure looks a little weird.  However, there is
    another condition that requires the flags to be pushed that will be
    inserted before the continue statement in a subsequent commit..
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 82a075d4d734588a42fca7ebaf529892d1eba853
Author: Martin Schwenke <martin at meltin.net>
Date:   Sun Jul 11 20:40:10 2021 +1000

    ctdb-recoverd: Add a helper variable
    
    Improves readability and simplifies subsequent changes.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 ctdb/client/client_control_sync.c          |  68 ++++++++++++++++
 ctdb/client/client_sync.h                  |  12 +++
 ctdb/include/ctdb_private.h                |   2 +
 ctdb/protocol/protocol.h                   |   4 +-
 ctdb/protocol/protocol_api.h               |   6 ++
 ctdb/protocol/protocol_client.c            |  36 +++++++++
 ctdb/protocol/protocol_control.c           |  12 +++
 ctdb/protocol/protocol_debug.c             |   2 +
 ctdb/server/ctdb_control.c                 |  42 ++++++++++
 ctdb/server/ctdb_daemon.c                  |  35 +++++++--
 ctdb/server/ctdb_monitor.c                 |  67 ++++++++--------
 ctdb/server/ctdb_recoverd.c                | 120 +++++++++++++++--------------
 ctdb/server/ctdb_server.c                  |   1 -
 ctdb/tests/UNIT/cunit/protocol_test_101.sh |   2 +-
 ctdb/tests/src/fake_ctdbd.c                |  54 +++++++++++++
 ctdb/tests/src/protocol_common_ctdb.c      |  24 ++++++
 ctdb/tests/src/protocol_ctdb_test.c        |   2 +-
 ctdb/tools/ctdb.c                          |  57 +++-----------
 18 files changed, 400 insertions(+), 146 deletions(-)


Changeset truncated at 500 lines:

diff --git a/ctdb/client/client_control_sync.c b/ctdb/client/client_control_sync.c
index b9a25ce2b2c..e9f97dd0f30 100644
--- a/ctdb/client/client_control_sync.c
+++ b/ctdb/client/client_control_sync.c
@@ -2660,3 +2660,71 @@ int ctdb_ctrl_tunnel_deregister(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
 
 	return 0;
 }
+
+int ctdb_ctrl_disable_node(TALLOC_CTX *mem_ctx,
+			   struct tevent_context *ev,
+			   struct ctdb_client_context *client,
+			   int destnode,
+			   struct timeval timeout)
+{
+	struct ctdb_req_control request;
+	struct ctdb_reply_control *reply;
+	int ret;
+
+	ctdb_req_control_disable_node(&request);
+	ret = ctdb_client_control(mem_ctx,
+				  ev,
+				  client,
+				  destnode,
+				  timeout,
+				  &request,
+				  &reply);
+	if (ret != 0) {
+		D_ERR("Control DISABLE_NODE failed to node %u, ret=%d\n",
+		      destnode,
+		      ret);
+		return ret;
+	}
+
+	ret = ctdb_reply_control_disable_node(reply);
+	if (ret != 0) {
+		D_ERR("Control DISABLE_NODE failed, ret=%d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+int ctdb_ctrl_enable_node(TALLOC_CTX *mem_ctx,
+			  struct tevent_context *ev,
+			  struct ctdb_client_context *client,
+			  int destnode,
+			  struct timeval timeout)
+{
+	struct ctdb_req_control request;
+	struct ctdb_reply_control *reply;
+	int ret;
+
+	ctdb_req_control_enable_node(&request);
+	ret = ctdb_client_control(mem_ctx,
+				  ev,
+				  client,
+				  destnode,
+				  timeout,
+				  &request,
+				  &reply);
+	if (ret != 0) {
+		D_ERR("Control ENABLE_NODE failed to node %u, ret=%d\n",
+		      destnode,
+		      ret);
+		return ret;
+	}
+
+	ret = ctdb_reply_control_enable_node(reply);
+	if (ret != 0) {
+		D_ERR("Control ENABLE_NODE failed, ret=%d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
diff --git a/ctdb/client/client_sync.h b/ctdb/client/client_sync.h
index dc8b67395e3..b8f5d905857 100644
--- a/ctdb/client/client_sync.h
+++ b/ctdb/client/client_sync.h
@@ -482,6 +482,18 @@ int ctdb_ctrl_tunnel_deregister(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
 				int destnode, struct timeval timeout,
 				uint64_t tunnel_id);
 
+int ctdb_ctrl_disable_node(TALLOC_CTX *mem_ctx,
+			   struct tevent_context *ev,
+			   struct ctdb_client_context *client,
+			   int destnode,
+			   struct timeval timeout);
+
+int ctdb_ctrl_enable_node(TALLOC_CTX *mem_ctx,
+			  struct tevent_context *ev,
+			  struct ctdb_client_context *client,
+			  int destnode,
+			  struct timeval timeout);
+
 /* from client/client_message_sync.c */
 
 int ctdb_message_recd_update_ip(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
diff --git a/ctdb/include/ctdb_private.h b/ctdb/include/ctdb_private.h
index 8eb6686f953..f5e647f08a5 100644
--- a/ctdb/include/ctdb_private.h
+++ b/ctdb/include/ctdb_private.h
@@ -565,6 +565,8 @@ int daemon_deregister_message_handler(struct ctdb_context *ctdb,
 void daemon_tunnel_handler(uint64_t tunnel_id, TDB_DATA data,
 			   void *private_data);
 
+struct ctdb_node *ctdb_find_node(struct ctdb_context *ctdb, uint32_t pnn);
+
 int ctdb_start_daemon(struct ctdb_context *ctdb,
 		      bool interactive,
 		      bool test_mode_enabled);
diff --git a/ctdb/protocol/protocol.h b/ctdb/protocol/protocol.h
index e4b76c6b986..5f788f6f2a8 100644
--- a/ctdb/protocol/protocol.h
+++ b/ctdb/protocol/protocol.h
@@ -137,7 +137,7 @@ struct ctdb_call {
 /* SRVID to inform clients that an IP address has been taken over */
 #define CTDB_SRVID_TAKE_IP 0xF301000000000000LL
 
-/* SRVID to inform recovery daemon of the node flags */
+/* SRVID to inform recovery daemon of the node flags - OBSOLETE */
 #define CTDB_SRVID_SET_NODE_FLAGS 0xF400000000000000LL
 
 /* SRVID to inform recovery daemon to update public ip assignment */
@@ -376,6 +376,8 @@ enum ctdb_controls {CTDB_CONTROL_PROCESS_EXISTS          = 0,
 		    CTDB_CONTROL_VACUUM_FETCH            = 154,
 		    CTDB_CONTROL_DB_VACUUM               = 155,
 		    CTDB_CONTROL_ECHO_DATA               = 156,
+		    CTDB_CONTROL_DISABLE_NODE            = 157,
+		    CTDB_CONTROL_ENABLE_NODE             = 158,
 };
 
 #define MAX_COUNT_BUCKETS 16
diff --git a/ctdb/protocol/protocol_api.h b/ctdb/protocol/protocol_api.h
index 7bbe33b22fe..499d9329c54 100644
--- a/ctdb/protocol/protocol_api.h
+++ b/ctdb/protocol/protocol_api.h
@@ -605,6 +605,12 @@ void ctdb_req_control_echo_data(struct ctdb_req_control *request,
 				struct ctdb_echo_data *echo_data);
 int ctdb_reply_control_echo_data(struct ctdb_reply_control *reply);
 
+void ctdb_req_control_disable_node(struct ctdb_req_control *request);
+int ctdb_reply_control_disable_node(struct ctdb_reply_control *reply);
+
+void ctdb_req_control_enable_node(struct ctdb_req_control *request);
+int ctdb_reply_control_enable_node(struct ctdb_reply_control *reply);
+
 /* From protocol/protocol_debug.c */
 
 void ctdb_packet_print(uint8_t *buf, size_t buflen, FILE *fp);
diff --git a/ctdb/protocol/protocol_client.c b/ctdb/protocol/protocol_client.c
index 6d850be86df..dcce83f02a1 100644
--- a/ctdb/protocol/protocol_client.c
+++ b/ctdb/protocol/protocol_client.c
@@ -2360,3 +2360,39 @@ int ctdb_reply_control_echo_data(struct ctdb_reply_control *reply)
 
 	return reply->status;
 }
+
+/* CTDB_CONTROL_DISABLE_NODE */
+
+void ctdb_req_control_disable_node(struct ctdb_req_control *request)
+{
+	request->opcode = CTDB_CONTROL_DISABLE_NODE;
+	request->pad = 0;
+	request->srvid = 0;
+	request->client_id = 0;
+	request->flags = 0;
+
+	request->rdata.opcode = CTDB_CONTROL_DISABLE_NODE;
+}
+
+int ctdb_reply_control_disable_node(struct ctdb_reply_control *reply)
+{
+	return ctdb_reply_control_generic(reply, CTDB_CONTROL_DISABLE_NODE);
+}
+
+/* CTDB_CONTROL_ENABLE_NODE */
+
+void ctdb_req_control_enable_node(struct ctdb_req_control *request)
+{
+	request->opcode = CTDB_CONTROL_ENABLE_NODE;
+	request->pad = 0;
+	request->srvid = 0;
+	request->client_id = 0;
+	request->flags = 0;
+
+	request->rdata.opcode = CTDB_CONTROL_ENABLE_NODE;
+}
+
+int ctdb_reply_control_enable_node(struct ctdb_reply_control *reply)
+{
+	return ctdb_reply_control_generic(reply, CTDB_CONTROL_ENABLE_NODE);
+}
diff --git a/ctdb/protocol/protocol_control.c b/ctdb/protocol/protocol_control.c
index fb6b0219ef7..f64a1a90e10 100644
--- a/ctdb/protocol/protocol_control.c
+++ b/ctdb/protocol/protocol_control.c
@@ -411,6 +411,12 @@ static size_t ctdb_req_control_data_len(struct ctdb_req_control_data *cd)
 	case CTDB_CONTROL_ECHO_DATA:
 		len = ctdb_echo_data_len(cd->data.echo_data);
 		break;
+
+	case CTDB_CONTROL_DISABLE_NODE:
+		break;
+
+	case CTDB_CONTROL_ENABLE_NODE:
+		break;
 	}
 
 	return len;
@@ -1385,6 +1391,12 @@ static size_t ctdb_reply_control_data_len(struct ctdb_reply_control_data *cd)
 	case CTDB_CONTROL_ECHO_DATA:
 		len = ctdb_echo_data_len(cd->data.echo_data);
 		break;
+
+	case CTDB_CONTROL_DISABLE_NODE:
+		break;
+
+	case CTDB_CONTROL_ENABLE_NODE:
+		break;
 	}
 
 	return len;
diff --git a/ctdb/protocol/protocol_debug.c b/ctdb/protocol/protocol_debug.c
index 694285515e1..d94cb548d68 100644
--- a/ctdb/protocol/protocol_debug.c
+++ b/ctdb/protocol/protocol_debug.c
@@ -243,6 +243,8 @@ static void ctdb_opcode_print(uint32_t opcode, FILE *fp)
 		{ CTDB_CONTROL_VACUUM_FETCH, "VACUUM_FETCH" },
 		{ CTDB_CONTROL_DB_VACUUM, "DB_VACUUM" },
 		{ CTDB_CONTROL_ECHO_DATA, "ECHO_DATA" },
+		{ CTDB_CONTROL_DISABLE_NODE, "DISABLE_NODE" },
+		{ CTDB_CONTROL_ENABLE_NODE, "ENABLE_NODE" },
 		{ MAP_END, "" },
 	};
 
diff --git a/ctdb/server/ctdb_control.c b/ctdb/server/ctdb_control.c
index 206ea149693..131ebd43afc 100644
--- a/ctdb/server/ctdb_control.c
+++ b/ctdb/server/ctdb_control.c
@@ -173,6 +173,40 @@ done:
 	TALLOC_FREE(state);
 }
 
+static int ctdb_control_disable_node(struct ctdb_context *ctdb)
+{
+	struct ctdb_node *node;
+
+	node = ctdb_find_node(ctdb, CTDB_CURRENT_NODE);
+	if (node == NULL) {
+		/* Can't happen */
+		DBG_ERR("Unable to find current node\n");
+		return -1;
+	}
+
+	D_ERR("Disable node\n");
+	node->flags |= NODE_FLAGS_PERMANENTLY_DISABLED;
+
+	return 0;
+}
+
+static int ctdb_control_enable_node(struct ctdb_context *ctdb)
+{
+	struct ctdb_node *node;
+
+	node = ctdb_find_node(ctdb, CTDB_CURRENT_NODE);
+	if (node == NULL) {
+		/* Can't happen */
+		DBG_ERR("Unable to find current node\n");
+		return -1;
+	}
+
+	D_ERR("Enable node\n");
+	node->flags &= ~NODE_FLAGS_PERMANENTLY_DISABLED;
+
+	return 0;
+}
+
 /*
   process a control request
  */
@@ -827,6 +861,14 @@ static int32_t ctdb_control_dispatch(struct ctdb_context *ctdb,
 		return ctdb_control_echo_data(ctdb, c, indata, async_reply);
 	}
 
+	case CTDB_CONTROL_DISABLE_NODE:
+		CHECK_CONTROL_DATA_SIZE(0);
+		return ctdb_control_disable_node(ctdb);
+
+	case CTDB_CONTROL_ENABLE_NODE:
+		CHECK_CONTROL_DATA_SIZE(0);
+		return ctdb_control_enable_node(ctdb);
+
 	default:
 		DEBUG(DEBUG_CRIT,(__location__ " Unknown CTDB control opcode %u\n", opcode));
 		return -1;
diff --git a/ctdb/server/ctdb_daemon.c b/ctdb/server/ctdb_daemon.c
index 57f80235e7c..0896ba08f90 100644
--- a/ctdb/server/ctdb_daemon.c
+++ b/ctdb/server/ctdb_daemon.c
@@ -1235,28 +1235,51 @@ failed:
 	return -1;
 }
 
-static void initialise_node_flags (struct ctdb_context *ctdb)
+struct ctdb_node *ctdb_find_node(struct ctdb_context *ctdb, uint32_t pnn)
 {
+	struct ctdb_node *node = NULL;
 	unsigned int i;
 
+	if (pnn == CTDB_CURRENT_NODE) {
+		pnn = ctdb->pnn;
+	}
+
 	/* Always found: PNN correctly set just before this is called */
 	for (i = 0; i < ctdb->num_nodes; i++) {
-		if (ctdb->pnn == ctdb->nodes[i]->pnn) {
-			break;
+		node = ctdb->nodes[i];
+		if (pnn == node->pnn) {
+			return node;
 		}
 	}
 
-	ctdb->nodes[i]->flags &= ~NODE_FLAGS_DISCONNECTED;
+	return NULL;
+}
+
+static void initialise_node_flags (struct ctdb_context *ctdb)
+{
+	struct ctdb_node *node = NULL;
+
+	node = ctdb_find_node(ctdb, CTDB_CURRENT_NODE);
+	/*
+	 * PNN correctly set just before this is called so always
+	 * found but keep static analysers happy...
+	 */
+	if (node == NULL) {
+		DBG_ERR("Unable to find current node\n");
+		return;
+	}
+
+	node->flags &= ~NODE_FLAGS_DISCONNECTED;
 
 	/* do we start out in DISABLED mode? */
 	if (ctdb->start_as_disabled != 0) {
 		D_ERR("This node is configured to start in DISABLED state\n");
-		ctdb->nodes[i]->flags |= NODE_FLAGS_DISABLED;
+		node->flags |= NODE_FLAGS_PERMANENTLY_DISABLED;
 	}
 	/* do we start out in STOPPED mode? */
 	if (ctdb->start_as_stopped != 0) {
 		D_ERR("This node is configured to start in STOPPED state\n");
-		ctdb->nodes[i]->flags |= NODE_FLAGS_STOPPED;
+		node->flags |= NODE_FLAGS_STOPPED;
 	}
 }
 
diff --git a/ctdb/server/ctdb_monitor.c b/ctdb/server/ctdb_monitor.c
index 5c694bde969..ab58ec485fe 100644
--- a/ctdb/server/ctdb_monitor.c
+++ b/ctdb/server/ctdb_monitor.c
@@ -455,52 +455,55 @@ int32_t ctdb_control_modflags(struct ctdb_context *ctdb, TDB_DATA indata)
 	struct ctdb_node *node;
 	uint32_t old_flags;
 
-	if (c->pnn >= ctdb->num_nodes) {
-		DEBUG(DEBUG_ERR,(__location__ " Node %d is invalid, num_nodes :%d\n", c->pnn, ctdb->num_nodes));
-		return -1;
+	/*
+	 * Don't let other nodes override the current node's flags.
+	 * The recovery master fetches flags from this node so there's
+	 * no need to push them back.  Doing so is racy.
+	 */
+	if (c->pnn == ctdb->pnn) {
+		DBG_DEBUG("Ignoring flag changes for current node\n");
+		return 0;
 	}
 
-	node         = ctdb->nodes[c->pnn];
-	old_flags    = node->flags;
-	if (c->pnn != ctdb->pnn) {
-		c->old_flags  = node->flags;
+	node = ctdb_find_node(ctdb, c->pnn);
+	if (node == NULL) {
+		DBG_ERR("Node %u is invalid\n", c->pnn);
+		return -1;
 	}
-	node->flags   = c->new_flags & ~NODE_FLAGS_DISCONNECTED;
-	node->flags  |= (c->old_flags & NODE_FLAGS_DISCONNECTED);
 
-	/* we don't let other nodes modify our STOPPED status */
-	if (c->pnn == ctdb->pnn) {
-		node->flags &= ~NODE_FLAGS_STOPPED;
-		if (old_flags & NODE_FLAGS_STOPPED) {
-			node->flags |= NODE_FLAGS_STOPPED;
-		}
+	if (node->flags & NODE_FLAGS_DISCONNECTED) {
+		DBG_DEBUG("Ignoring flag changes for disconnected node\n");
+		return 0;
 	}
 
-	/* we don't let other nodes modify our BANNED status */
-	if (c->pnn == ctdb->pnn) {
-		node->flags &= ~NODE_FLAGS_BANNED;
-		if (old_flags & NODE_FLAGS_BANNED) {
-			node->flags |= NODE_FLAGS_BANNED;
-		}
-	}
+	/*
+	 * Remember the old flags.  We don't care what some other node
+	 * thought the old flags were - that's irrelevant.
+	 */
+	old_flags = node->flags;
 
-	if (node->flags == c->old_flags) {
-		DEBUG(DEBUG_INFO, ("Control modflags on node %u - Unchanged - flags 0x%x\n", c->pnn, node->flags));
+	/*
+	 * This node tracks nodes it is connected to, so don't let
+	 * another node override this
+	 */
+	node->flags =
+		(old_flags & NODE_FLAGS_DISCONNECTED) |
+		(c->new_flags & ~NODE_FLAGS_DISCONNECTED);
+
+	if (node->flags == old_flags) {
 		return 0;
 	}
 
-	DEBUG(DEBUG_INFO, ("Control modflags on node %u - flags now 0x%x\n", c->pnn, node->flags));
+	D_NOTICE("Node %u has changed flags - 0x%x -> 0x%x\n",
+		 c->pnn,
+		 old_flags,
+		 node->flags);
 
 	if (node->flags == 0 && ctdb->runstate <= CTDB_RUNSTATE_STARTUP) {
-		DEBUG(DEBUG_ERR, (__location__ " Node %u became healthy - force recovery for startup\n",
-				  c->pnn));
+		DBG_ERR("Node %u became healthy - force recovery for startup\n",
+			c->pnn);
 		ctdb->recovery_mode = CTDB_RECOVERY_ACTIVE;
 	}
 
-	/* tell the recovery daemon something has changed */
-	c->new_flags = node->flags;
-	ctdb_daemon_send_message(ctdb, ctdb->pnn,
-				 CTDB_SRVID_SET_NODE_FLAGS, indata);
-
 	return 0;
 }
diff --git a/ctdb/server/ctdb_recoverd.c b/ctdb/server/ctdb_recoverd.c
index 4ba8729b50e..dfa6d0d089b 100644
--- a/ctdb/server/ctdb_recoverd.c
+++ b/ctdb/server/ctdb_recoverd.c
@@ -553,40 +553,73 @@ static int update_flags(struct ctdb_recoverd *rec,
 	for (j=0; j<nodemap->num; j++) {
 		struct ctdb_node_map_old *remote_nodemap=NULL;
 		uint32_t local_flags = nodemap->nodes[j].flags;
+		uint32_t remote_pnn = nodemap->nodes[j].pnn;
 		uint32_t remote_flags;
+		unsigned int i;
 		int ret;
 
 		if (local_flags & NODE_FLAGS_DISCONNECTED) {
 			continue;
 		}
-		if (nodemap->nodes[j].pnn == ctdb->pnn) {
-			continue;
+		if (remote_pnn == ctdb->pnn) {
+			/*
+			 * No remote nodemap for this node since this
+			 * is the local nodemap.  However, still need
+			 * to check this against the remote nodes and
+			 * push it if they are out-of-date.
+			 */
+			goto compare_remotes;
 		}
 
 		remote_nodemap = remote_nodemaps[j];
 		remote_flags = remote_nodemap->nodes[j].flags;
 
 		if (local_flags != remote_flags) {
-			ret = update_flags_on_all_nodes(rec,
-							nodemap->nodes[j].pnn,
-							remote_flags);
-			if (ret != 0) {
-				DBG_ERR(
-				    "Unable to update flags on remote nodes\n");
-				talloc_free(mem_ctx);
-				return -1;
-			}
-
 			/*
 			 * Update the local copy of the flags in the
 			 * recovery daemon.
 			 */
 			D_NOTICE("Remote node %u had flags 0x%x, "
 				 "local had 0x%x - updating local\n",


-- 
Samba Shared Repository



More information about the samba-cvs mailing list