[SCM] Samba Shared Repository - branch v4-14-test updated

Jule Anger janger at samba.org
Tue Sep 14 07:38:01 UTC 2021


The branch, v4-14-test has been updated
       via  551a39d890a ctdb-daemon: Don't mark a node as unhealthy when connecting to it
       via  2d6cf082db5 ctdb-daemon: Ignore flag changes for disconnected nodes
       via  814844538aa ctdb-daemon: Simplify ctdb_control_modflags()
       via  a7ea1ab3e6a ctdb-recoverd: Mark CTDB_SRVID_SET_NODE_FLAGS obsolete
       via  eab3ee12fe0 ctdb-daemon: Don't bother sending CTDB_SRVID_SET_NODE_FLAGS
       via  e3eeffafff8 ctdb-daemon: Modernise remaining debug macro in this function
       via  cfbac3b5ab9 ctdb-daemon: Update logging for flag changes
       via  c906c9a0b39 ctdb-daemon: Correct the condition for logging unchanged flags
       via  00c1757d92e ctdb-tools: Use disable and enable controls in tool
       via  c8d130f139a ctdb-client: Add client code for disable/enable controls
       via  cb64c64ddb3 ctdb_daemon: Implement controls DISABLE_NODE/ENABLE_NODE
       via  e158aa6d9bd ctdb-daemon: Start as disabled means PERMANENTLY_DISABLED
       via  116db8d54f8 ctdb-daemon: Factor out a function to get node structure from PNN
       via  50596cf0029 ctdb-daemon: Add a helper variable
       via  79961f5a33a ctdb-protocol: Add marshalling for controls DISABLE_NODE/ENABLE_NODE
       via  88660d4e2f8 ctdb-protocol: Add new controls to disable and enable nodes
       via  c61fe558427 ctdb-recoverd: Push flags for a node if any remote node disagrees
       via  c1e217c0e2e ctdb-recoverd: Update the local node map before pushing out flags
       via  69f744e539f ctdb-recoverd: Add a helper variable
      from  e9cbf386be7 vfs_btrfs: fix btrfs_fget_compression()

https://git.samba.org/?p=samba.git;a=shortlog;h=v4-14-test


- Log -----------------------------------------------------------------
commit 551a39d890acb2405a1d1e011e56dc566e8a36f7
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 17:25:32 2021 +1000

    ctdb-daemon: Don't mark a node as unhealthy when connecting to it
    
    Remote nodes are already initialised as UNHEALTHY when the node list
    is initialised at startup (ctdb_load_nodes_file() calls
    convert_node_map_to_list()) and when disconnected (ctdb_node_dead()).
    So, drop this code.
    
    RN: Fix CTDB flag/status update race conditions
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Thu Sep  9 02:38:34 UTC 2021 on sn-devel-184
    
    (cherry picked from commit 9e7d2d9794af7251c42cb22f23ee9f86c6ea05c1)
    
    Autobuild-User(v4-14-test): Jule Anger <janger at samba.org>
    Autobuild-Date(v4-14-test): Tue Sep 14 07:37:32 UTC 2021 on sn-devel-184

commit 2d6cf082db51cb5c2748d1cb893e2befc2ae56ef
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jul 27 15:50:54 2021 +1000

    ctdb-daemon: Ignore flag changes for disconnected nodes
    
    If this node is not connected to a node then we shouldn't know
    anything about it.  The state will be pushed later by the recovery
    master.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 7f697b1938efb3972f03f25546bf807d5af9a26c)

commit 814844538aaf97aed54082b4d6b9e22b3fe9b220
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 11:11:11 2021 +1000

    ctdb-daemon: Simplify ctdb_control_modflags()
    
    Now that there are separate disable/enable controls used by the ctdb
    tool this control can ignore any flag updates for the current nodes.
    These only come from the recovery master, which depends on being able
    to fetch flags for all nodes.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit ae10a8a4b70e53ea3be6257d1f86f2d9a56aa62a)

commit a7ea1ab3e6a32cf1d6a6012f95ef5db7410ad78e
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jan 17 19:04:34 2018 +1100

    ctdb-recoverd: Mark CTDB_SRVID_SET_NODE_FLAGS obsolete
    
    CTDB_SRVID_SET_NODE_FLAGS is no longer sent so drop monitor_handler()
    and replace with srvid_not_implemented().  Mark the SRVID obsolete in
    its comment.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 916c5ee131dc5c7f1d9c3540147d1f915c8302ad)

commit eab3ee12fe01f9fc814e0fd92b28d13dd62c9bf1
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 11:32:20 2021 +1000

    ctdb-daemon: Don't bother sending CTDB_SRVID_SET_NODE_FLAGS
    
    The code that handles this message is
    ctdb_recoverd.c:monitor_handler().  Although it appears to do
    something potentially useful, it only logs the flags changes.  All
    changes made are to local structures - there are no actual
    side-effects.
    
    It used to trigger a takeover run when the DISABLED flag changed.
    This was dropped back in commit
    662f06de9fdce7b1bc1772a4fbe43de271564917.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit e75256767fffc6a7ac0b97e58737a39c63c8b187)

commit e3eeffafff84b3b447d38ae03efa7dea9a91d199
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 11:34:49 2021 +1000

    ctdb-daemon: Modernise remaining debug macro in this function
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 0132bd5a2233193256af434a37506f86ed62c075)

commit cfbac3b5ab942457f3d2aae8451bbe835a8d0648
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 11:29:38 2021 +1000

    ctdb-daemon: Update logging for flag changes
    
    When flags change, promote the message to NOTICE level and switch the
    message to the style that is currently generated by
    ctdb-recoverd.c:monitor_handler().  This will allow monitor_handler()
    to go away in future.
    
    Drop logging when flags do not change.  The recovery master now logs
    when it pushes flags for a node, so the lack of a corresponding
    "changed flags" message here indicates that no update was required.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit b6d25d079e30919457cacbfbbfd670bf88295a9c)

commit c906c9a0b393b98e2f914135bdf92cfe17e5b18a
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 15:13:49 2021 +1000

    ctdb-daemon: Correct the condition for logging unchanged flags
    
    Don't trust the old flags from the recovery master.
    
    Surrounding code will change in future comments, including the use of
    old-style debug macros, so just make this change clear.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit eec44e286250a6ee7b5c42d85d632bdc300a409f)

commit 00c1757d92e6e17d8c9e2ea6170e50a390e17c72
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:37:19 2021 +1000

    ctdb-tools: Use disable and enable controls in tool
    
    Note that there a change from broadcast to a directed control here.
    This is OK because the recovery master will push flags if any nodes
    disagree with the canonical flags fetched from a node.
    
    Static function ctdb_ctrl_modflags() is no longer used to drop it.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 5914054698dab934fd4db5efb9d211b2fdc40bb9)

commit c8d130f139ad3da7880f2ca4b15aa485684d0f0b
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:32:12 2021 +1000

    ctdb-client: Add client code for disable/enable controls
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 6fe6a54e7f32e650be6ab36041159081dbde5165)

commit cb64c64ddb34512d3e347e99f197a299cd02a91a
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:12:59 2021 +1000

    ctdb_daemon: Implement controls DISABLE_NODE/ENABLE_NODE
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 15a6489c288b3adb635a728cb2049621ab1a07f7)

commit e158aa6d9bd4eac72c5f529e51bff2a6ae3a1263
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:02:28 2021 +1000

    ctdb-daemon: Start as disabled means PERMANENTLY_DISABLED
    
    DISABLED is UNHEALTHY | PERMANENTLY_DISABLED, which is not what is
    intended here.  Luckily, it doesn't do any harm because nodes are
    marked unhealthy at startup anyway.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 60c1ef146538d90f97b7823459f7548ca5fa6dd3)

commit 116db8d54f8a4b792c759e481571e384e32d7a82
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:01:33 2021 +1000

    ctdb-daemon: Factor out a function to get node structure from PNN
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 1ac7bc7532b2fad791d0e53effa7c64cdc73c4eb)

commit 50596cf0029d2b027d537832bea8ca23cd4ccfc0
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jul 28 10:27:42 2021 +1000

    ctdb-daemon: Add a helper variable
    
    Simplifies a subsequent change.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit e0a7b5a9e866452b1faaed86a105492fe7b237e2)

commit 79961f5a33a43556d79fbafebbefb2baea8c1079
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 12:10:12 2021 +1000

    ctdb-protocol: Add marshalling for controls DISABLE_NODE/ENABLE_NODE
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 6845dca87e6ffc5e449fb78d23eb9c7a22698b80)

commit 88660d4e2f8efa137e9d5a99682b6060bcdb98eb
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 17:28:20 2021 +1000

    ctdb-protocol: Add new controls to disable and enable nodes
    
    These are CTDB_CONTROL_DISABLE_NODE and CTDB_CONTROL_ENABLE_NODE.
    
    For consistency these match CTDB_CONTROL_STOP_NODE and
    CTDB_CONTROL_CONTINUE_NODE.  It would be possible to add a single
    control but it would need to take data.
    
    The aim is to finally fix races in flag handling.  Previous fixes have
    improved the situation but they have only narrowed the race window.
    The problem is that the recovery daemon on the master node pushes
    flags to nodes the same way that disable and enable are implemented.
    So the following sequence is still racy:
    
    1. Node A is disabled
    2. Recovery master pulls flags from all nodes including A
    3. Node A is enabled
    4. Recovery master notices A is disabled and pushes a flag update to
       all nodes including node A
    5. Node A is erroneously marked disabled
    
    Node A can not tell if the MODIFY_FLAGS control is from a "ctdb
    disable" command or a flag update from the recovery master.
    
    The solution is to use a different mechanism for disable/enable and
    for a node to ignore MODIFY_FLAGS controls for their own flags.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 49dc5d8cd2d3767044ac69cbd25c8210d11cadf7)

commit c61fe558427bd532e9291a255528d45cd83c8393
Author: Martin Schwenke <martin at meltin.net>
Date:   Sun Jul 11 22:17:08 2021 +1000

    ctdb-recoverd: Push flags for a node if any remote node disagrees
    
    This will usually happen if flags on the node in question change, so
    keeping the code simple and pushing to all nodes won't hurt.  When all
    nodes come up there might be differences in connected nodes, causing
    such "fix ups".  Receiving nodes will ignore no-op pushes.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 8305f6a7f132f03b0bbdb26692b7491fd3f6c24f)

commit c1e217c0e2ecff8c8005f2a225193884eb4c3fae
Author: Martin Schwenke <martin at meltin.net>
Date:   Sun Jul 11 21:28:43 2021 +1000

    ctdb-recoverd: Update the local node map before pushing out flags
    
    The resulting code structure looks a little weird.  However, there is
    another condition that requires the flags to be pushed that will be
    inserted before the continue statement in a subsequent commit..
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 620d07871420cdbfa055c1ace75ec1ac4c32721d)

commit 69f744e539f5be3123bef0ac9cf6dff84cb1779f
Author: Martin Schwenke <martin at meltin.net>
Date:   Sun Jul 11 20:40:10 2021 +1000

    ctdb-recoverd: Add a helper variable
    
    Improves readability and simplifies subsequent changes.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 82a075d4d734588a42fca7ebaf529892d1eba853)

-----------------------------------------------------------------------

Summary of changes:
 ctdb/client/client_control_sync.c          |  68 ++++++++++++++++
 ctdb/client/client_sync.h                  |  12 +++
 ctdb/include/ctdb_private.h                |   2 +
 ctdb/protocol/protocol.h                   |   4 +-
 ctdb/protocol/protocol_api.h               |   6 ++
 ctdb/protocol/protocol_client.c            |  36 +++++++++
 ctdb/protocol/protocol_control.c           |  12 +++
 ctdb/protocol/protocol_debug.c             |   2 +
 ctdb/server/ctdb_control.c                 |  42 ++++++++++
 ctdb/server/ctdb_daemon.c                  |  35 +++++++--
 ctdb/server/ctdb_monitor.c                 |  67 ++++++++--------
 ctdb/server/ctdb_recoverd.c                | 120 +++++++++++++++--------------
 ctdb/server/ctdb_server.c                  |   1 -
 ctdb/tests/UNIT/cunit/protocol_test_101.sh |   2 +-
 ctdb/tests/src/fake_ctdbd.c                |  54 +++++++++++++
 ctdb/tests/src/protocol_common_ctdb.c      |  24 ++++++
 ctdb/tests/src/protocol_ctdb_test.c        |   2 +-
 ctdb/tools/ctdb.c                          |  57 +++-----------
 18 files changed, 400 insertions(+), 146 deletions(-)


Changeset truncated at 500 lines:

diff --git a/ctdb/client/client_control_sync.c b/ctdb/client/client_control_sync.c
index b9a25ce2b2c..e9f97dd0f30 100644
--- a/ctdb/client/client_control_sync.c
+++ b/ctdb/client/client_control_sync.c
@@ -2660,3 +2660,71 @@ int ctdb_ctrl_tunnel_deregister(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
 
 	return 0;
 }
+
+int ctdb_ctrl_disable_node(TALLOC_CTX *mem_ctx,
+			   struct tevent_context *ev,
+			   struct ctdb_client_context *client,
+			   int destnode,
+			   struct timeval timeout)
+{
+	struct ctdb_req_control request;
+	struct ctdb_reply_control *reply;
+	int ret;
+
+	ctdb_req_control_disable_node(&request);
+	ret = ctdb_client_control(mem_ctx,
+				  ev,
+				  client,
+				  destnode,
+				  timeout,
+				  &request,
+				  &reply);
+	if (ret != 0) {
+		D_ERR("Control DISABLE_NODE failed to node %u, ret=%d\n",
+		      destnode,
+		      ret);
+		return ret;
+	}
+
+	ret = ctdb_reply_control_disable_node(reply);
+	if (ret != 0) {
+		D_ERR("Control DISABLE_NODE failed, ret=%d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+int ctdb_ctrl_enable_node(TALLOC_CTX *mem_ctx,
+			  struct tevent_context *ev,
+			  struct ctdb_client_context *client,
+			  int destnode,
+			  struct timeval timeout)
+{
+	struct ctdb_req_control request;
+	struct ctdb_reply_control *reply;
+	int ret;
+
+	ctdb_req_control_enable_node(&request);
+	ret = ctdb_client_control(mem_ctx,
+				  ev,
+				  client,
+				  destnode,
+				  timeout,
+				  &request,
+				  &reply);
+	if (ret != 0) {
+		D_ERR("Control ENABLE_NODE failed to node %u, ret=%d\n",
+		      destnode,
+		      ret);
+		return ret;
+	}
+
+	ret = ctdb_reply_control_enable_node(reply);
+	if (ret != 0) {
+		D_ERR("Control ENABLE_NODE failed, ret=%d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
diff --git a/ctdb/client/client_sync.h b/ctdb/client/client_sync.h
index dc8b67395e3..b8f5d905857 100644
--- a/ctdb/client/client_sync.h
+++ b/ctdb/client/client_sync.h
@@ -482,6 +482,18 @@ int ctdb_ctrl_tunnel_deregister(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
 				int destnode, struct timeval timeout,
 				uint64_t tunnel_id);
 
+int ctdb_ctrl_disable_node(TALLOC_CTX *mem_ctx,
+			   struct tevent_context *ev,
+			   struct ctdb_client_context *client,
+			   int destnode,
+			   struct timeval timeout);
+
+int ctdb_ctrl_enable_node(TALLOC_CTX *mem_ctx,
+			  struct tevent_context *ev,
+			  struct ctdb_client_context *client,
+			  int destnode,
+			  struct timeval timeout);
+
 /* from client/client_message_sync.c */
 
 int ctdb_message_recd_update_ip(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
diff --git a/ctdb/include/ctdb_private.h b/ctdb/include/ctdb_private.h
index 8eb6686f953..f5e647f08a5 100644
--- a/ctdb/include/ctdb_private.h
+++ b/ctdb/include/ctdb_private.h
@@ -565,6 +565,8 @@ int daemon_deregister_message_handler(struct ctdb_context *ctdb,
 void daemon_tunnel_handler(uint64_t tunnel_id, TDB_DATA data,
 			   void *private_data);
 
+struct ctdb_node *ctdb_find_node(struct ctdb_context *ctdb, uint32_t pnn);
+
 int ctdb_start_daemon(struct ctdb_context *ctdb,
 		      bool interactive,
 		      bool test_mode_enabled);
diff --git a/ctdb/protocol/protocol.h b/ctdb/protocol/protocol.h
index e4b76c6b986..5f788f6f2a8 100644
--- a/ctdb/protocol/protocol.h
+++ b/ctdb/protocol/protocol.h
@@ -137,7 +137,7 @@ struct ctdb_call {
 /* SRVID to inform clients that an IP address has been taken over */
 #define CTDB_SRVID_TAKE_IP 0xF301000000000000LL
 
-/* SRVID to inform recovery daemon of the node flags */
+/* SRVID to inform recovery daemon of the node flags - OBSOLETE */
 #define CTDB_SRVID_SET_NODE_FLAGS 0xF400000000000000LL
 
 /* SRVID to inform recovery daemon to update public ip assignment */
@@ -376,6 +376,8 @@ enum ctdb_controls {CTDB_CONTROL_PROCESS_EXISTS          = 0,
 		    CTDB_CONTROL_VACUUM_FETCH            = 154,
 		    CTDB_CONTROL_DB_VACUUM               = 155,
 		    CTDB_CONTROL_ECHO_DATA               = 156,
+		    CTDB_CONTROL_DISABLE_NODE            = 157,
+		    CTDB_CONTROL_ENABLE_NODE             = 158,
 };
 
 #define MAX_COUNT_BUCKETS 16
diff --git a/ctdb/protocol/protocol_api.h b/ctdb/protocol/protocol_api.h
index 7bbe33b22fe..499d9329c54 100644
--- a/ctdb/protocol/protocol_api.h
+++ b/ctdb/protocol/protocol_api.h
@@ -605,6 +605,12 @@ void ctdb_req_control_echo_data(struct ctdb_req_control *request,
 				struct ctdb_echo_data *echo_data);
 int ctdb_reply_control_echo_data(struct ctdb_reply_control *reply);
 
+void ctdb_req_control_disable_node(struct ctdb_req_control *request);
+int ctdb_reply_control_disable_node(struct ctdb_reply_control *reply);
+
+void ctdb_req_control_enable_node(struct ctdb_req_control *request);
+int ctdb_reply_control_enable_node(struct ctdb_reply_control *reply);
+
 /* From protocol/protocol_debug.c */
 
 void ctdb_packet_print(uint8_t *buf, size_t buflen, FILE *fp);
diff --git a/ctdb/protocol/protocol_client.c b/ctdb/protocol/protocol_client.c
index 6d850be86df..dcce83f02a1 100644
--- a/ctdb/protocol/protocol_client.c
+++ b/ctdb/protocol/protocol_client.c
@@ -2360,3 +2360,39 @@ int ctdb_reply_control_echo_data(struct ctdb_reply_control *reply)
 
 	return reply->status;
 }
+
+/* CTDB_CONTROL_DISABLE_NODE */
+
+void ctdb_req_control_disable_node(struct ctdb_req_control *request)
+{
+	request->opcode = CTDB_CONTROL_DISABLE_NODE;
+	request->pad = 0;
+	request->srvid = 0;
+	request->client_id = 0;
+	request->flags = 0;
+
+	request->rdata.opcode = CTDB_CONTROL_DISABLE_NODE;
+}
+
+int ctdb_reply_control_disable_node(struct ctdb_reply_control *reply)
+{
+	return ctdb_reply_control_generic(reply, CTDB_CONTROL_DISABLE_NODE);
+}
+
+/* CTDB_CONTROL_ENABLE_NODE */
+
+void ctdb_req_control_enable_node(struct ctdb_req_control *request)
+{
+	request->opcode = CTDB_CONTROL_ENABLE_NODE;
+	request->pad = 0;
+	request->srvid = 0;
+	request->client_id = 0;
+	request->flags = 0;
+
+	request->rdata.opcode = CTDB_CONTROL_ENABLE_NODE;
+}
+
+int ctdb_reply_control_enable_node(struct ctdb_reply_control *reply)
+{
+	return ctdb_reply_control_generic(reply, CTDB_CONTROL_ENABLE_NODE);
+}
diff --git a/ctdb/protocol/protocol_control.c b/ctdb/protocol/protocol_control.c
index fb6b0219ef7..f64a1a90e10 100644
--- a/ctdb/protocol/protocol_control.c
+++ b/ctdb/protocol/protocol_control.c
@@ -411,6 +411,12 @@ static size_t ctdb_req_control_data_len(struct ctdb_req_control_data *cd)
 	case CTDB_CONTROL_ECHO_DATA:
 		len = ctdb_echo_data_len(cd->data.echo_data);
 		break;
+
+	case CTDB_CONTROL_DISABLE_NODE:
+		break;
+
+	case CTDB_CONTROL_ENABLE_NODE:
+		break;
 	}
 
 	return len;
@@ -1385,6 +1391,12 @@ static size_t ctdb_reply_control_data_len(struct ctdb_reply_control_data *cd)
 	case CTDB_CONTROL_ECHO_DATA:
 		len = ctdb_echo_data_len(cd->data.echo_data);
 		break;
+
+	case CTDB_CONTROL_DISABLE_NODE:
+		break;
+
+	case CTDB_CONTROL_ENABLE_NODE:
+		break;
 	}
 
 	return len;
diff --git a/ctdb/protocol/protocol_debug.c b/ctdb/protocol/protocol_debug.c
index 694285515e1..d94cb548d68 100644
--- a/ctdb/protocol/protocol_debug.c
+++ b/ctdb/protocol/protocol_debug.c
@@ -243,6 +243,8 @@ static void ctdb_opcode_print(uint32_t opcode, FILE *fp)
 		{ CTDB_CONTROL_VACUUM_FETCH, "VACUUM_FETCH" },
 		{ CTDB_CONTROL_DB_VACUUM, "DB_VACUUM" },
 		{ CTDB_CONTROL_ECHO_DATA, "ECHO_DATA" },
+		{ CTDB_CONTROL_DISABLE_NODE, "DISABLE_NODE" },
+		{ CTDB_CONTROL_ENABLE_NODE, "ENABLE_NODE" },
 		{ MAP_END, "" },
 	};
 
diff --git a/ctdb/server/ctdb_control.c b/ctdb/server/ctdb_control.c
index 206ea149693..131ebd43afc 100644
--- a/ctdb/server/ctdb_control.c
+++ b/ctdb/server/ctdb_control.c
@@ -173,6 +173,40 @@ done:
 	TALLOC_FREE(state);
 }
 
+static int ctdb_control_disable_node(struct ctdb_context *ctdb)
+{
+	struct ctdb_node *node;
+
+	node = ctdb_find_node(ctdb, CTDB_CURRENT_NODE);
+	if (node == NULL) {
+		/* Can't happen */
+		DBG_ERR("Unable to find current node\n");
+		return -1;
+	}
+
+	D_ERR("Disable node\n");
+	node->flags |= NODE_FLAGS_PERMANENTLY_DISABLED;
+
+	return 0;
+}
+
+static int ctdb_control_enable_node(struct ctdb_context *ctdb)
+{
+	struct ctdb_node *node;
+
+	node = ctdb_find_node(ctdb, CTDB_CURRENT_NODE);
+	if (node == NULL) {
+		/* Can't happen */
+		DBG_ERR("Unable to find current node\n");
+		return -1;
+	}
+
+	D_ERR("Enable node\n");
+	node->flags &= ~NODE_FLAGS_PERMANENTLY_DISABLED;
+
+	return 0;
+}
+
 /*
   process a control request
  */
@@ -827,6 +861,14 @@ static int32_t ctdb_control_dispatch(struct ctdb_context *ctdb,
 		return ctdb_control_echo_data(ctdb, c, indata, async_reply);
 	}
 
+	case CTDB_CONTROL_DISABLE_NODE:
+		CHECK_CONTROL_DATA_SIZE(0);
+		return ctdb_control_disable_node(ctdb);
+
+	case CTDB_CONTROL_ENABLE_NODE:
+		CHECK_CONTROL_DATA_SIZE(0);
+		return ctdb_control_enable_node(ctdb);
+
 	default:
 		DEBUG(DEBUG_CRIT,(__location__ " Unknown CTDB control opcode %u\n", opcode));
 		return -1;
diff --git a/ctdb/server/ctdb_daemon.c b/ctdb/server/ctdb_daemon.c
index 9035f5b4748..6a76b2ea998 100644
--- a/ctdb/server/ctdb_daemon.c
+++ b/ctdb/server/ctdb_daemon.c
@@ -1235,28 +1235,51 @@ failed:
 	return -1;
 }
 
-static void initialise_node_flags (struct ctdb_context *ctdb)
+struct ctdb_node *ctdb_find_node(struct ctdb_context *ctdb, uint32_t pnn)
 {
+	struct ctdb_node *node = NULL;
 	unsigned int i;
 
+	if (pnn == CTDB_CURRENT_NODE) {
+		pnn = ctdb->pnn;
+	}
+
 	/* Always found: PNN correctly set just before this is called */
 	for (i = 0; i < ctdb->num_nodes; i++) {
-		if (ctdb->pnn == ctdb->nodes[i]->pnn) {
-			break;
+		node = ctdb->nodes[i];
+		if (pnn == node->pnn) {
+			return node;
 		}
 	}
 
-	ctdb->nodes[i]->flags &= ~NODE_FLAGS_DISCONNECTED;
+	return NULL;
+}
+
+static void initialise_node_flags (struct ctdb_context *ctdb)
+{
+	struct ctdb_node *node = NULL;
+
+	node = ctdb_find_node(ctdb, CTDB_CURRENT_NODE);
+	/*
+	 * PNN correctly set just before this is called so always
+	 * found but keep static analysers happy...
+	 */
+	if (node == NULL) {
+		DBG_ERR("Unable to find current node\n");
+		return;
+	}
+
+	node->flags &= ~NODE_FLAGS_DISCONNECTED;
 
 	/* do we start out in DISABLED mode? */
 	if (ctdb->start_as_disabled != 0) {
 		D_ERR("This node is configured to start in DISABLED state\n");
-		ctdb->nodes[i]->flags |= NODE_FLAGS_DISABLED;
+		node->flags |= NODE_FLAGS_PERMANENTLY_DISABLED;
 	}
 	/* do we start out in STOPPED mode? */
 	if (ctdb->start_as_stopped != 0) {
 		D_ERR("This node is configured to start in STOPPED state\n");
-		ctdb->nodes[i]->flags |= NODE_FLAGS_STOPPED;
+		node->flags |= NODE_FLAGS_STOPPED;
 	}
 }
 
diff --git a/ctdb/server/ctdb_monitor.c b/ctdb/server/ctdb_monitor.c
index 5c694bde969..ab58ec485fe 100644
--- a/ctdb/server/ctdb_monitor.c
+++ b/ctdb/server/ctdb_monitor.c
@@ -455,52 +455,55 @@ int32_t ctdb_control_modflags(struct ctdb_context *ctdb, TDB_DATA indata)
 	struct ctdb_node *node;
 	uint32_t old_flags;
 
-	if (c->pnn >= ctdb->num_nodes) {
-		DEBUG(DEBUG_ERR,(__location__ " Node %d is invalid, num_nodes :%d\n", c->pnn, ctdb->num_nodes));
-		return -1;
+	/*
+	 * Don't let other nodes override the current node's flags.
+	 * The recovery master fetches flags from this node so there's
+	 * no need to push them back.  Doing so is racy.
+	 */
+	if (c->pnn == ctdb->pnn) {
+		DBG_DEBUG("Ignoring flag changes for current node\n");
+		return 0;
 	}
 
-	node         = ctdb->nodes[c->pnn];
-	old_flags    = node->flags;
-	if (c->pnn != ctdb->pnn) {
-		c->old_flags  = node->flags;
+	node = ctdb_find_node(ctdb, c->pnn);
+	if (node == NULL) {
+		DBG_ERR("Node %u is invalid\n", c->pnn);
+		return -1;
 	}
-	node->flags   = c->new_flags & ~NODE_FLAGS_DISCONNECTED;
-	node->flags  |= (c->old_flags & NODE_FLAGS_DISCONNECTED);
 
-	/* we don't let other nodes modify our STOPPED status */
-	if (c->pnn == ctdb->pnn) {
-		node->flags &= ~NODE_FLAGS_STOPPED;
-		if (old_flags & NODE_FLAGS_STOPPED) {
-			node->flags |= NODE_FLAGS_STOPPED;
-		}
+	if (node->flags & NODE_FLAGS_DISCONNECTED) {
+		DBG_DEBUG("Ignoring flag changes for disconnected node\n");
+		return 0;
 	}
 
-	/* we don't let other nodes modify our BANNED status */
-	if (c->pnn == ctdb->pnn) {
-		node->flags &= ~NODE_FLAGS_BANNED;
-		if (old_flags & NODE_FLAGS_BANNED) {
-			node->flags |= NODE_FLAGS_BANNED;
-		}
-	}
+	/*
+	 * Remember the old flags.  We don't care what some other node
+	 * thought the old flags were - that's irrelevant.
+	 */
+	old_flags = node->flags;
 
-	if (node->flags == c->old_flags) {
-		DEBUG(DEBUG_INFO, ("Control modflags on node %u - Unchanged - flags 0x%x\n", c->pnn, node->flags));
+	/*
+	 * This node tracks nodes it is connected to, so don't let
+	 * another node override this
+	 */
+	node->flags =
+		(old_flags & NODE_FLAGS_DISCONNECTED) |
+		(c->new_flags & ~NODE_FLAGS_DISCONNECTED);
+
+	if (node->flags == old_flags) {
 		return 0;
 	}
 
-	DEBUG(DEBUG_INFO, ("Control modflags on node %u - flags now 0x%x\n", c->pnn, node->flags));
+	D_NOTICE("Node %u has changed flags - 0x%x -> 0x%x\n",
+		 c->pnn,
+		 old_flags,
+		 node->flags);
 
 	if (node->flags == 0 && ctdb->runstate <= CTDB_RUNSTATE_STARTUP) {
-		DEBUG(DEBUG_ERR, (__location__ " Node %u became healthy - force recovery for startup\n",
-				  c->pnn));
+		DBG_ERR("Node %u became healthy - force recovery for startup\n",
+			c->pnn);
 		ctdb->recovery_mode = CTDB_RECOVERY_ACTIVE;
 	}
 
-	/* tell the recovery daemon something has changed */
-	c->new_flags = node->flags;
-	ctdb_daemon_send_message(ctdb, ctdb->pnn,
-				 CTDB_SRVID_SET_NODE_FLAGS, indata);
-
 	return 0;
 }
diff --git a/ctdb/server/ctdb_recoverd.c b/ctdb/server/ctdb_recoverd.c
index 4ba8729b50e..dfa6d0d089b 100644
--- a/ctdb/server/ctdb_recoverd.c
+++ b/ctdb/server/ctdb_recoverd.c
@@ -553,40 +553,73 @@ static int update_flags(struct ctdb_recoverd *rec,
 	for (j=0; j<nodemap->num; j++) {
 		struct ctdb_node_map_old *remote_nodemap=NULL;
 		uint32_t local_flags = nodemap->nodes[j].flags;
+		uint32_t remote_pnn = nodemap->nodes[j].pnn;
 		uint32_t remote_flags;
+		unsigned int i;
 		int ret;
 
 		if (local_flags & NODE_FLAGS_DISCONNECTED) {
 			continue;
 		}
-		if (nodemap->nodes[j].pnn == ctdb->pnn) {
-			continue;
+		if (remote_pnn == ctdb->pnn) {
+			/*
+			 * No remote nodemap for this node since this
+			 * is the local nodemap.  However, still need
+			 * to check this against the remote nodes and
+			 * push it if they are out-of-date.
+			 */
+			goto compare_remotes;
 		}
 
 		remote_nodemap = remote_nodemaps[j];
 		remote_flags = remote_nodemap->nodes[j].flags;
 
 		if (local_flags != remote_flags) {
-			ret = update_flags_on_all_nodes(rec,
-							nodemap->nodes[j].pnn,
-							remote_flags);
-			if (ret != 0) {
-				DBG_ERR(
-				    "Unable to update flags on remote nodes\n");
-				talloc_free(mem_ctx);
-				return -1;
-			}
-
 			/*
 			 * Update the local copy of the flags in the
 			 * recovery daemon.
 			 */
 			D_NOTICE("Remote node %u had flags 0x%x, "
 				 "local had 0x%x - updating local\n",


-- 
Samba Shared Repository



More information about the samba-cvs mailing list