[SCM] Samba Shared Repository - branch v4-15-test updated

Jule Anger janger at samba.org
Mon Sep 13 12:34:02 UTC 2021


The branch, v4-15-test has been updated
       via  8d4c482410c ctdb-daemon: Don't mark a node as unhealthy when connecting to it
       via  7c353e6e383 ctdb-daemon: Ignore flag changes for disconnected nodes
       via  665b380d249 ctdb-daemon: Simplify ctdb_control_modflags()
       via  f340dcbc675 ctdb-recoverd: Mark CTDB_SRVID_SET_NODE_FLAGS obsolete
       via  c8a9f9147c2 ctdb-daemon: Don't bother sending CTDB_SRVID_SET_NODE_FLAGS
       via  17e0a052da0 ctdb-daemon: Modernise remaining debug macro in this function
       via  05d2f5e41c7 ctdb-daemon: Update logging for flag changes
       via  e634ddde5e6 ctdb-daemon: Correct the condition for logging unchanged flags
       via  9f06ec8b108 ctdb-tools: Use disable and enable controls in tool
       via  772126bd68b ctdb-client: Add client code for disable/enable controls
       via  8ed5910b847 ctdb_daemon: Implement controls DISABLE_NODE/ENABLE_NODE
       via  b5f8913f359 ctdb-daemon: Start as disabled means PERMANENTLY_DISABLED
       via  c61b5e7b489 ctdb-daemon: Factor out a function to get node structure from PNN
       via  65d64194b6d ctdb-daemon: Add a helper variable
       via  675d68caabc ctdb-protocol: Add marshalling for controls DISABLE_NODE/ENABLE_NODE
       via  84a285851d7 ctdb-protocol: Add new controls to disable and enable nodes
       via  c01d48d7a54 ctdb-recoverd: Push flags for a node if any remote node disagrees
       via  2cc4b917f78 ctdb-recoverd: Update the local node map before pushing out flags
       via  f8fa33ac320 ctdb-recoverd: Add a helper variable
      from  bddd7db7b2f WHATSNEW: The New VFS

https://git.samba.org/?p=samba.git;a=shortlog;h=v4-15-test


- Log -----------------------------------------------------------------
commit 8d4c482410c4de451d26ce004247e9cc10aea832
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 17:25:32 2021 +1000

    ctdb-daemon: Don't mark a node as unhealthy when connecting to it
    
    Remote nodes are already initialised as UNHEALTHY when the node list
    is initialised at startup (ctdb_load_nodes_file() calls
    convert_node_map_to_list()) and when disconnected (ctdb_node_dead()).
    So, drop this code.
    
    RN: Fix CTDB flag/status update race conditions
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Thu Sep  9 02:38:34 UTC 2021 on sn-devel-184
    
    (cherry picked from commit 9e7d2d9794af7251c42cb22f23ee9f86c6ea05c1)
    
    Autobuild-User(v4-15-test): Jule Anger <janger at samba.org>
    Autobuild-Date(v4-15-test): Mon Sep 13 12:33:53 UTC 2021 on sn-devel-184

commit 7c353e6e383b408de9d2823b32ff8e0527510d02
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jul 27 15:50:54 2021 +1000

    ctdb-daemon: Ignore flag changes for disconnected nodes
    
    If this node is not connected to a node then we shouldn't know
    anything about it.  The state will be pushed later by the recovery
    master.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 7f697b1938efb3972f03f25546bf807d5af9a26c)

commit 665b380d2490f312c7409a3c9d29572ad3664216
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 11:11:11 2021 +1000

    ctdb-daemon: Simplify ctdb_control_modflags()
    
    Now that there are separate disable/enable controls used by the ctdb
    tool this control can ignore any flag updates for the current nodes.
    These only come from the recovery master, which depends on being able
    to fetch flags for all nodes.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit ae10a8a4b70e53ea3be6257d1f86f2d9a56aa62a)

commit f340dcbc675ec0efecaccf3a3258435dde85dd51
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jan 17 19:04:34 2018 +1100

    ctdb-recoverd: Mark CTDB_SRVID_SET_NODE_FLAGS obsolete
    
    CTDB_SRVID_SET_NODE_FLAGS is no longer sent so drop monitor_handler()
    and replace with srvid_not_implemented().  Mark the SRVID obsolete in
    its comment.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 916c5ee131dc5c7f1d9c3540147d1f915c8302ad)

commit c8a9f9147c2215b14d9b666954948b592b646b12
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 11:32:20 2021 +1000

    ctdb-daemon: Don't bother sending CTDB_SRVID_SET_NODE_FLAGS
    
    The code that handles this message is
    ctdb_recoverd.c:monitor_handler().  Although it appears to do
    something potentially useful, it only logs the flags changes.  All
    changes made are to local structures - there are no actual
    side-effects.
    
    It used to trigger a takeover run when the DISABLED flag changed.
    This was dropped back in commit
    662f06de9fdce7b1bc1772a4fbe43de271564917.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit e75256767fffc6a7ac0b97e58737a39c63c8b187)

commit 17e0a052da07207ad063383fb1913794c12460a6
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 11:34:49 2021 +1000

    ctdb-daemon: Modernise remaining debug macro in this function
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 0132bd5a2233193256af434a37506f86ed62c075)

commit 05d2f5e41c7a3e426c1be7bbe45913ef21c77728
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 11:29:38 2021 +1000

    ctdb-daemon: Update logging for flag changes
    
    When flags change, promote the message to NOTICE level and switch the
    message to the style that is currently generated by
    ctdb-recoverd.c:monitor_handler().  This will allow monitor_handler()
    to go away in future.
    
    Drop logging when flags do not change.  The recovery master now logs
    when it pushes flags for a node, so the lack of a corresponding
    "changed flags" message here indicates that no update was required.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit b6d25d079e30919457cacbfbbfd670bf88295a9c)

commit e634ddde5e6518ecd9e5bcf36b210bb6f16e89a6
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 15:13:49 2021 +1000

    ctdb-daemon: Correct the condition for logging unchanged flags
    
    Don't trust the old flags from the recovery master.
    
    Surrounding code will change in future comments, including the use of
    old-style debug macros, so just make this change clear.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit eec44e286250a6ee7b5c42d85d632bdc300a409f)

commit 9f06ec8b108178ebd2c8d1e1fab9331383e30a52
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:37:19 2021 +1000

    ctdb-tools: Use disable and enable controls in tool
    
    Note that there a change from broadcast to a directed control here.
    This is OK because the recovery master will push flags if any nodes
    disagree with the canonical flags fetched from a node.
    
    Static function ctdb_ctrl_modflags() is no longer used to drop it.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 5914054698dab934fd4db5efb9d211b2fdc40bb9)

commit 772126bd68b1deb56c0b48e3c8b8530993cb866d
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:32:12 2021 +1000

    ctdb-client: Add client code for disable/enable controls
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 6fe6a54e7f32e650be6ab36041159081dbde5165)

commit 8ed5910b8471c61149ddbc37c0aef8837d8a7029
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:12:59 2021 +1000

    ctdb_daemon: Implement controls DISABLE_NODE/ENABLE_NODE
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 15a6489c288b3adb635a728cb2049621ab1a07f7)

commit b5f8913f359c24105e85c49fb0b1e476d0c2f353
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:02:28 2021 +1000

    ctdb-daemon: Start as disabled means PERMANENTLY_DISABLED
    
    DISABLED is UNHEALTHY | PERMANENTLY_DISABLED, which is not what is
    intended here.  Luckily, it doesn't do any harm because nodes are
    marked unhealthy at startup anyway.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 60c1ef146538d90f97b7823459f7548ca5fa6dd3)

commit c61b5e7b4890a96f3ea309017d9cbe8ce8e017fa
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 14:01:33 2021 +1000

    ctdb-daemon: Factor out a function to get node structure from PNN
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 1ac7bc7532b2fad791d0e53effa7c64cdc73c4eb)

commit 65d64194b6db3304a40585c8cb95f43e31c4222c
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jul 28 10:27:42 2021 +1000

    ctdb-daemon: Add a helper variable
    
    Simplifies a subsequent change.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit e0a7b5a9e866452b1faaed86a105492fe7b237e2)

commit 675d68caabc59b5b47b744157173b4fc9476e32e
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 9 12:10:12 2021 +1000

    ctdb-protocol: Add marshalling for controls DISABLE_NODE/ENABLE_NODE
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 6845dca87e6ffc5e449fb78d23eb9c7a22698b80)

commit 84a285851d7fea7843667e67ef317995e6c54bc5
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 8 17:28:20 2021 +1000

    ctdb-protocol: Add new controls to disable and enable nodes
    
    These are CTDB_CONTROL_DISABLE_NODE and CTDB_CONTROL_ENABLE_NODE.
    
    For consistency these match CTDB_CONTROL_STOP_NODE and
    CTDB_CONTROL_CONTINUE_NODE.  It would be possible to add a single
    control but it would need to take data.
    
    The aim is to finally fix races in flag handling.  Previous fixes have
    improved the situation but they have only narrowed the race window.
    The problem is that the recovery daemon on the master node pushes
    flags to nodes the same way that disable and enable are implemented.
    So the following sequence is still racy:
    
    1. Node A is disabled
    2. Recovery master pulls flags from all nodes including A
    3. Node A is enabled
    4. Recovery master notices A is disabled and pushes a flag update to
       all nodes including node A
    5. Node A is erroneously marked disabled
    
    Node A can not tell if the MODIFY_FLAGS control is from a "ctdb
    disable" command or a flag update from the recovery master.
    
    The solution is to use a different mechanism for disable/enable and
    for a node to ignore MODIFY_FLAGS controls for their own flags.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 49dc5d8cd2d3767044ac69cbd25c8210d11cadf7)

commit c01d48d7a542bfb0b319fb18d0eea51b232ea62a
Author: Martin Schwenke <martin at meltin.net>
Date:   Sun Jul 11 22:17:08 2021 +1000

    ctdb-recoverd: Push flags for a node if any remote node disagrees
    
    This will usually happen if flags on the node in question change, so
    keeping the code simple and pushing to all nodes won't hurt.  When all
    nodes come up there might be differences in connected nodes, causing
    such "fix ups".  Receiving nodes will ignore no-op pushes.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 8305f6a7f132f03b0bbdb26692b7491fd3f6c24f)

commit 2cc4b917f78340f09e6d55efb0af97958c07fae3
Author: Martin Schwenke <martin at meltin.net>
Date:   Sun Jul 11 21:28:43 2021 +1000

    ctdb-recoverd: Update the local node map before pushing out flags
    
    The resulting code structure looks a little weird.  However, there is
    another condition that requires the flags to be pushed that will be
    inserted before the continue statement in a subsequent commit..
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 620d07871420cdbfa055c1ace75ec1ac4c32721d)

commit f8fa33ac320a22dcac34f09bbea35af1aa804dfc
Author: Martin Schwenke <martin at meltin.net>
Date:   Sun Jul 11 20:40:10 2021 +1000

    ctdb-recoverd: Add a helper variable
    
    Improves readability and simplifies subsequent changes.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 82a075d4d734588a42fca7ebaf529892d1eba853)

-----------------------------------------------------------------------

Summary of changes:
 ctdb/client/client_control_sync.c          |  68 ++++++++++++++++
 ctdb/client/client_sync.h                  |  12 +++
 ctdb/include/ctdb_private.h                |   2 +
 ctdb/protocol/protocol.h                   |   4 +-
 ctdb/protocol/protocol_api.h               |   6 ++
 ctdb/protocol/protocol_client.c            |  36 +++++++++
 ctdb/protocol/protocol_control.c           |  12 +++
 ctdb/protocol/protocol_debug.c             |   2 +
 ctdb/server/ctdb_control.c                 |  42 ++++++++++
 ctdb/server/ctdb_daemon.c                  |  35 +++++++--
 ctdb/server/ctdb_monitor.c                 |  67 ++++++++--------
 ctdb/server/ctdb_recoverd.c                | 120 +++++++++++++++--------------
 ctdb/server/ctdb_server.c                  |   1 -
 ctdb/tests/UNIT/cunit/protocol_test_101.sh |   2 +-
 ctdb/tests/src/fake_ctdbd.c                |  54 +++++++++++++
 ctdb/tests/src/protocol_common_ctdb.c      |  24 ++++++
 ctdb/tests/src/protocol_ctdb_test.c        |   2 +-
 ctdb/tools/ctdb.c                          |  57 +++-----------
 18 files changed, 400 insertions(+), 146 deletions(-)


Changeset truncated at 500 lines:

diff --git a/ctdb/client/client_control_sync.c b/ctdb/client/client_control_sync.c
index b9a25ce2b2c..e9f97dd0f30 100644
--- a/ctdb/client/client_control_sync.c
+++ b/ctdb/client/client_control_sync.c
@@ -2660,3 +2660,71 @@ int ctdb_ctrl_tunnel_deregister(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
 
 	return 0;
 }
+
+int ctdb_ctrl_disable_node(TALLOC_CTX *mem_ctx,
+			   struct tevent_context *ev,
+			   struct ctdb_client_context *client,
+			   int destnode,
+			   struct timeval timeout)
+{
+	struct ctdb_req_control request;
+	struct ctdb_reply_control *reply;
+	int ret;
+
+	ctdb_req_control_disable_node(&request);
+	ret = ctdb_client_control(mem_ctx,
+				  ev,
+				  client,
+				  destnode,
+				  timeout,
+				  &request,
+				  &reply);
+	if (ret != 0) {
+		D_ERR("Control DISABLE_NODE failed to node %u, ret=%d\n",
+		      destnode,
+		      ret);
+		return ret;
+	}
+
+	ret = ctdb_reply_control_disable_node(reply);
+	if (ret != 0) {
+		D_ERR("Control DISABLE_NODE failed, ret=%d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+int ctdb_ctrl_enable_node(TALLOC_CTX *mem_ctx,
+			  struct tevent_context *ev,
+			  struct ctdb_client_context *client,
+			  int destnode,
+			  struct timeval timeout)
+{
+	struct ctdb_req_control request;
+	struct ctdb_reply_control *reply;
+	int ret;
+
+	ctdb_req_control_enable_node(&request);
+	ret = ctdb_client_control(mem_ctx,
+				  ev,
+				  client,
+				  destnode,
+				  timeout,
+				  &request,
+				  &reply);
+	if (ret != 0) {
+		D_ERR("Control ENABLE_NODE failed to node %u, ret=%d\n",
+		      destnode,
+		      ret);
+		return ret;
+	}
+
+	ret = ctdb_reply_control_enable_node(reply);
+	if (ret != 0) {
+		D_ERR("Control ENABLE_NODE failed, ret=%d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
diff --git a/ctdb/client/client_sync.h b/ctdb/client/client_sync.h
index dc8b67395e3..b8f5d905857 100644
--- a/ctdb/client/client_sync.h
+++ b/ctdb/client/client_sync.h
@@ -482,6 +482,18 @@ int ctdb_ctrl_tunnel_deregister(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
 				int destnode, struct timeval timeout,
 				uint64_t tunnel_id);
 
+int ctdb_ctrl_disable_node(TALLOC_CTX *mem_ctx,
+			   struct tevent_context *ev,
+			   struct ctdb_client_context *client,
+			   int destnode,
+			   struct timeval timeout);
+
+int ctdb_ctrl_enable_node(TALLOC_CTX *mem_ctx,
+			  struct tevent_context *ev,
+			  struct ctdb_client_context *client,
+			  int destnode,
+			  struct timeval timeout);
+
 /* from client/client_message_sync.c */
 
 int ctdb_message_recd_update_ip(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
diff --git a/ctdb/include/ctdb_private.h b/ctdb/include/ctdb_private.h
index 8eb6686f953..f5e647f08a5 100644
--- a/ctdb/include/ctdb_private.h
+++ b/ctdb/include/ctdb_private.h
@@ -565,6 +565,8 @@ int daemon_deregister_message_handler(struct ctdb_context *ctdb,
 void daemon_tunnel_handler(uint64_t tunnel_id, TDB_DATA data,
 			   void *private_data);
 
+struct ctdb_node *ctdb_find_node(struct ctdb_context *ctdb, uint32_t pnn);
+
 int ctdb_start_daemon(struct ctdb_context *ctdb,
 		      bool interactive,
 		      bool test_mode_enabled);
diff --git a/ctdb/protocol/protocol.h b/ctdb/protocol/protocol.h
index e4b76c6b986..5f788f6f2a8 100644
--- a/ctdb/protocol/protocol.h
+++ b/ctdb/protocol/protocol.h
@@ -137,7 +137,7 @@ struct ctdb_call {
 /* SRVID to inform clients that an IP address has been taken over */
 #define CTDB_SRVID_TAKE_IP 0xF301000000000000LL
 
-/* SRVID to inform recovery daemon of the node flags */
+/* SRVID to inform recovery daemon of the node flags - OBSOLETE */
 #define CTDB_SRVID_SET_NODE_FLAGS 0xF400000000000000LL
 
 /* SRVID to inform recovery daemon to update public ip assignment */
@@ -376,6 +376,8 @@ enum ctdb_controls {CTDB_CONTROL_PROCESS_EXISTS          = 0,
 		    CTDB_CONTROL_VACUUM_FETCH            = 154,
 		    CTDB_CONTROL_DB_VACUUM               = 155,
 		    CTDB_CONTROL_ECHO_DATA               = 156,
+		    CTDB_CONTROL_DISABLE_NODE            = 157,
+		    CTDB_CONTROL_ENABLE_NODE             = 158,
 };
 
 #define MAX_COUNT_BUCKETS 16
diff --git a/ctdb/protocol/protocol_api.h b/ctdb/protocol/protocol_api.h
index 7bbe33b22fe..499d9329c54 100644
--- a/ctdb/protocol/protocol_api.h
+++ b/ctdb/protocol/protocol_api.h
@@ -605,6 +605,12 @@ void ctdb_req_control_echo_data(struct ctdb_req_control *request,
 				struct ctdb_echo_data *echo_data);
 int ctdb_reply_control_echo_data(struct ctdb_reply_control *reply);
 
+void ctdb_req_control_disable_node(struct ctdb_req_control *request);
+int ctdb_reply_control_disable_node(struct ctdb_reply_control *reply);
+
+void ctdb_req_control_enable_node(struct ctdb_req_control *request);
+int ctdb_reply_control_enable_node(struct ctdb_reply_control *reply);
+
 /* From protocol/protocol_debug.c */
 
 void ctdb_packet_print(uint8_t *buf, size_t buflen, FILE *fp);
diff --git a/ctdb/protocol/protocol_client.c b/ctdb/protocol/protocol_client.c
index 6d850be86df..dcce83f02a1 100644
--- a/ctdb/protocol/protocol_client.c
+++ b/ctdb/protocol/protocol_client.c
@@ -2360,3 +2360,39 @@ int ctdb_reply_control_echo_data(struct ctdb_reply_control *reply)
 
 	return reply->status;
 }
+
+/* CTDB_CONTROL_DISABLE_NODE */
+
+void ctdb_req_control_disable_node(struct ctdb_req_control *request)
+{
+	request->opcode = CTDB_CONTROL_DISABLE_NODE;
+	request->pad = 0;
+	request->srvid = 0;
+	request->client_id = 0;
+	request->flags = 0;
+
+	request->rdata.opcode = CTDB_CONTROL_DISABLE_NODE;
+}
+
+int ctdb_reply_control_disable_node(struct ctdb_reply_control *reply)
+{
+	return ctdb_reply_control_generic(reply, CTDB_CONTROL_DISABLE_NODE);
+}
+
+/* CTDB_CONTROL_ENABLE_NODE */
+
+void ctdb_req_control_enable_node(struct ctdb_req_control *request)
+{
+	request->opcode = CTDB_CONTROL_ENABLE_NODE;
+	request->pad = 0;
+	request->srvid = 0;
+	request->client_id = 0;
+	request->flags = 0;
+
+	request->rdata.opcode = CTDB_CONTROL_ENABLE_NODE;
+}
+
+int ctdb_reply_control_enable_node(struct ctdb_reply_control *reply)
+{
+	return ctdb_reply_control_generic(reply, CTDB_CONTROL_ENABLE_NODE);
+}
diff --git a/ctdb/protocol/protocol_control.c b/ctdb/protocol/protocol_control.c
index fb6b0219ef7..f64a1a90e10 100644
--- a/ctdb/protocol/protocol_control.c
+++ b/ctdb/protocol/protocol_control.c
@@ -411,6 +411,12 @@ static size_t ctdb_req_control_data_len(struct ctdb_req_control_data *cd)
 	case CTDB_CONTROL_ECHO_DATA:
 		len = ctdb_echo_data_len(cd->data.echo_data);
 		break;
+
+	case CTDB_CONTROL_DISABLE_NODE:
+		break;
+
+	case CTDB_CONTROL_ENABLE_NODE:
+		break;
 	}
 
 	return len;
@@ -1385,6 +1391,12 @@ static size_t ctdb_reply_control_data_len(struct ctdb_reply_control_data *cd)
 	case CTDB_CONTROL_ECHO_DATA:
 		len = ctdb_echo_data_len(cd->data.echo_data);
 		break;
+
+	case CTDB_CONTROL_DISABLE_NODE:
+		break;
+
+	case CTDB_CONTROL_ENABLE_NODE:
+		break;
 	}
 
 	return len;
diff --git a/ctdb/protocol/protocol_debug.c b/ctdb/protocol/protocol_debug.c
index 694285515e1..d94cb548d68 100644
--- a/ctdb/protocol/protocol_debug.c
+++ b/ctdb/protocol/protocol_debug.c
@@ -243,6 +243,8 @@ static void ctdb_opcode_print(uint32_t opcode, FILE *fp)
 		{ CTDB_CONTROL_VACUUM_FETCH, "VACUUM_FETCH" },
 		{ CTDB_CONTROL_DB_VACUUM, "DB_VACUUM" },
 		{ CTDB_CONTROL_ECHO_DATA, "ECHO_DATA" },
+		{ CTDB_CONTROL_DISABLE_NODE, "DISABLE_NODE" },
+		{ CTDB_CONTROL_ENABLE_NODE, "ENABLE_NODE" },
 		{ MAP_END, "" },
 	};
 
diff --git a/ctdb/server/ctdb_control.c b/ctdb/server/ctdb_control.c
index 206ea149693..131ebd43afc 100644
--- a/ctdb/server/ctdb_control.c
+++ b/ctdb/server/ctdb_control.c
@@ -173,6 +173,40 @@ done:
 	TALLOC_FREE(state);
 }
 
+static int ctdb_control_disable_node(struct ctdb_context *ctdb)
+{
+	struct ctdb_node *node;
+
+	node = ctdb_find_node(ctdb, CTDB_CURRENT_NODE);
+	if (node == NULL) {
+		/* Can't happen */
+		DBG_ERR("Unable to find current node\n");
+		return -1;
+	}
+
+	D_ERR("Disable node\n");
+	node->flags |= NODE_FLAGS_PERMANENTLY_DISABLED;
+
+	return 0;
+}
+
+static int ctdb_control_enable_node(struct ctdb_context *ctdb)
+{
+	struct ctdb_node *node;
+
+	node = ctdb_find_node(ctdb, CTDB_CURRENT_NODE);
+	if (node == NULL) {
+		/* Can't happen */
+		DBG_ERR("Unable to find current node\n");
+		return -1;
+	}
+
+	D_ERR("Enable node\n");
+	node->flags &= ~NODE_FLAGS_PERMANENTLY_DISABLED;
+
+	return 0;
+}
+
 /*
   process a control request
  */
@@ -827,6 +861,14 @@ static int32_t ctdb_control_dispatch(struct ctdb_context *ctdb,
 		return ctdb_control_echo_data(ctdb, c, indata, async_reply);
 	}
 
+	case CTDB_CONTROL_DISABLE_NODE:
+		CHECK_CONTROL_DATA_SIZE(0);
+		return ctdb_control_disable_node(ctdb);
+
+	case CTDB_CONTROL_ENABLE_NODE:
+		CHECK_CONTROL_DATA_SIZE(0);
+		return ctdb_control_enable_node(ctdb);
+
 	default:
 		DEBUG(DEBUG_CRIT,(__location__ " Unknown CTDB control opcode %u\n", opcode));
 		return -1;
diff --git a/ctdb/server/ctdb_daemon.c b/ctdb/server/ctdb_daemon.c
index 57f80235e7c..0896ba08f90 100644
--- a/ctdb/server/ctdb_daemon.c
+++ b/ctdb/server/ctdb_daemon.c
@@ -1235,28 +1235,51 @@ failed:
 	return -1;
 }
 
-static void initialise_node_flags (struct ctdb_context *ctdb)
+struct ctdb_node *ctdb_find_node(struct ctdb_context *ctdb, uint32_t pnn)
 {
+	struct ctdb_node *node = NULL;
 	unsigned int i;
 
+	if (pnn == CTDB_CURRENT_NODE) {
+		pnn = ctdb->pnn;
+	}
+
 	/* Always found: PNN correctly set just before this is called */
 	for (i = 0; i < ctdb->num_nodes; i++) {
-		if (ctdb->pnn == ctdb->nodes[i]->pnn) {
-			break;
+		node = ctdb->nodes[i];
+		if (pnn == node->pnn) {
+			return node;
 		}
 	}
 
-	ctdb->nodes[i]->flags &= ~NODE_FLAGS_DISCONNECTED;
+	return NULL;
+}
+
+static void initialise_node_flags (struct ctdb_context *ctdb)
+{
+	struct ctdb_node *node = NULL;
+
+	node = ctdb_find_node(ctdb, CTDB_CURRENT_NODE);
+	/*
+	 * PNN correctly set just before this is called so always
+	 * found but keep static analysers happy...
+	 */
+	if (node == NULL) {
+		DBG_ERR("Unable to find current node\n");
+		return;
+	}
+
+	node->flags &= ~NODE_FLAGS_DISCONNECTED;
 
 	/* do we start out in DISABLED mode? */
 	if (ctdb->start_as_disabled != 0) {
 		D_ERR("This node is configured to start in DISABLED state\n");
-		ctdb->nodes[i]->flags |= NODE_FLAGS_DISABLED;
+		node->flags |= NODE_FLAGS_PERMANENTLY_DISABLED;
 	}
 	/* do we start out in STOPPED mode? */
 	if (ctdb->start_as_stopped != 0) {
 		D_ERR("This node is configured to start in STOPPED state\n");
-		ctdb->nodes[i]->flags |= NODE_FLAGS_STOPPED;
+		node->flags |= NODE_FLAGS_STOPPED;
 	}
 }
 
diff --git a/ctdb/server/ctdb_monitor.c b/ctdb/server/ctdb_monitor.c
index 5c694bde969..ab58ec485fe 100644
--- a/ctdb/server/ctdb_monitor.c
+++ b/ctdb/server/ctdb_monitor.c
@@ -455,52 +455,55 @@ int32_t ctdb_control_modflags(struct ctdb_context *ctdb, TDB_DATA indata)
 	struct ctdb_node *node;
 	uint32_t old_flags;
 
-	if (c->pnn >= ctdb->num_nodes) {
-		DEBUG(DEBUG_ERR,(__location__ " Node %d is invalid, num_nodes :%d\n", c->pnn, ctdb->num_nodes));
-		return -1;
+	/*
+	 * Don't let other nodes override the current node's flags.
+	 * The recovery master fetches flags from this node so there's
+	 * no need to push them back.  Doing so is racy.
+	 */
+	if (c->pnn == ctdb->pnn) {
+		DBG_DEBUG("Ignoring flag changes for current node\n");
+		return 0;
 	}
 
-	node         = ctdb->nodes[c->pnn];
-	old_flags    = node->flags;
-	if (c->pnn != ctdb->pnn) {
-		c->old_flags  = node->flags;
+	node = ctdb_find_node(ctdb, c->pnn);
+	if (node == NULL) {
+		DBG_ERR("Node %u is invalid\n", c->pnn);
+		return -1;
 	}
-	node->flags   = c->new_flags & ~NODE_FLAGS_DISCONNECTED;
-	node->flags  |= (c->old_flags & NODE_FLAGS_DISCONNECTED);
 
-	/* we don't let other nodes modify our STOPPED status */
-	if (c->pnn == ctdb->pnn) {
-		node->flags &= ~NODE_FLAGS_STOPPED;
-		if (old_flags & NODE_FLAGS_STOPPED) {
-			node->flags |= NODE_FLAGS_STOPPED;
-		}
+	if (node->flags & NODE_FLAGS_DISCONNECTED) {
+		DBG_DEBUG("Ignoring flag changes for disconnected node\n");
+		return 0;
 	}
 
-	/* we don't let other nodes modify our BANNED status */
-	if (c->pnn == ctdb->pnn) {
-		node->flags &= ~NODE_FLAGS_BANNED;
-		if (old_flags & NODE_FLAGS_BANNED) {
-			node->flags |= NODE_FLAGS_BANNED;
-		}
-	}
+	/*
+	 * Remember the old flags.  We don't care what some other node
+	 * thought the old flags were - that's irrelevant.
+	 */
+	old_flags = node->flags;
 
-	if (node->flags == c->old_flags) {
-		DEBUG(DEBUG_INFO, ("Control modflags on node %u - Unchanged - flags 0x%x\n", c->pnn, node->flags));
+	/*
+	 * This node tracks nodes it is connected to, so don't let
+	 * another node override this
+	 */
+	node->flags =
+		(old_flags & NODE_FLAGS_DISCONNECTED) |
+		(c->new_flags & ~NODE_FLAGS_DISCONNECTED);
+
+	if (node->flags == old_flags) {
 		return 0;
 	}
 
-	DEBUG(DEBUG_INFO, ("Control modflags on node %u - flags now 0x%x\n", c->pnn, node->flags));
+	D_NOTICE("Node %u has changed flags - 0x%x -> 0x%x\n",
+		 c->pnn,
+		 old_flags,
+		 node->flags);
 
 	if (node->flags == 0 && ctdb->runstate <= CTDB_RUNSTATE_STARTUP) {
-		DEBUG(DEBUG_ERR, (__location__ " Node %u became healthy - force recovery for startup\n",
-				  c->pnn));
+		DBG_ERR("Node %u became healthy - force recovery for startup\n",
+			c->pnn);
 		ctdb->recovery_mode = CTDB_RECOVERY_ACTIVE;
 	}
 
-	/* tell the recovery daemon something has changed */
-	c->new_flags = node->flags;
-	ctdb_daemon_send_message(ctdb, ctdb->pnn,
-				 CTDB_SRVID_SET_NODE_FLAGS, indata);
-
 	return 0;
 }
diff --git a/ctdb/server/ctdb_recoverd.c b/ctdb/server/ctdb_recoverd.c
index 4ba8729b50e..dfa6d0d089b 100644
--- a/ctdb/server/ctdb_recoverd.c
+++ b/ctdb/server/ctdb_recoverd.c
@@ -553,40 +553,73 @@ static int update_flags(struct ctdb_recoverd *rec,
 	for (j=0; j<nodemap->num; j++) {
 		struct ctdb_node_map_old *remote_nodemap=NULL;
 		uint32_t local_flags = nodemap->nodes[j].flags;
+		uint32_t remote_pnn = nodemap->nodes[j].pnn;
 		uint32_t remote_flags;
+		unsigned int i;
 		int ret;
 
 		if (local_flags & NODE_FLAGS_DISCONNECTED) {
 			continue;
 		}
-		if (nodemap->nodes[j].pnn == ctdb->pnn) {
-			continue;
+		if (remote_pnn == ctdb->pnn) {
+			/*
+			 * No remote nodemap for this node since this
+			 * is the local nodemap.  However, still need
+			 * to check this against the remote nodes and
+			 * push it if they are out-of-date.
+			 */
+			goto compare_remotes;
 		}
 
 		remote_nodemap = remote_nodemaps[j];
 		remote_flags = remote_nodemap->nodes[j].flags;
 
 		if (local_flags != remote_flags) {
-			ret = update_flags_on_all_nodes(rec,
-							nodemap->nodes[j].pnn,
-							remote_flags);
-			if (ret != 0) {
-				DBG_ERR(
-				    "Unable to update flags on remote nodes\n");
-				talloc_free(mem_ctx);
-				return -1;
-			}
-
 			/*
 			 * Update the local copy of the flags in the
 			 * recovery daemon.
 			 */
 			D_NOTICE("Remote node %u had flags 0x%x, "
 				 "local had 0x%x - updating local\n",


-- 
Samba Shared Repository



More information about the samba-cvs mailing list