[SCM] Samba Shared Repository - branch master updated
Martin Schwenke
martins at samba.org
Mon Jan 17 11:17:02 UTC 2022
The branch, master has been updated
via da2e1047f1f WHATSNEW: Document CTDB leader and cluster lock changes
via f7de2132bb9 ctdb-doc: Remove documentation for recovery process
via a940ad93706 ctdb-doc: Update example configuration migration script
via 01313ea243e ctdb-tests: Improve test coverage for leader role yield and elections
via 5d317781498 ctdb-tests: Support commenting out local daemons configuration options
via 34d2ca0ae64 ctdb-config: Add configuration option [cluster] leader timeout
via 1dfb266038f ctdb-config: [legacy] recmaster capability -> [cluster] leader capability
via f5a39058f07 ctdb-config: [cluster] recovery lock -> [cluster] cluster lock
via d752a92e115 ctdb-doc: Update documentation for leader and cluster lock
via 73555e8248a ctdb-recoverd: Use race for cluster lock as election when lock is enabled
via 938d64c8ff3 ctdb-protocol: Mark {GET,SET}_RECMASTER controls obsolete
via 03ae158cffc ctdb-protocol: Drop marshalling for {GET,SET}_RECMASTER controls
via a76374070d3 ctdb-daemon: Drop implementation of {GET,SET}_RECMASTER controls
via 193b624d26a ctdb-protocol: Drop protocol client functions for recmaster controls
via cda673ff6dc ctdb-client: Drop unused recmaster functions
via 16efbca0036 ctdb-daemon: Drop unused old client recmaster functions
via c68267b2a60 ctdb-recoverd: Drop calls to ctdb_ctrl_setrecmaster()
via 58d7fcdf7c9 ctdb-recoverd: Drop recovery master verification
via f02e0974857 ctdb-tools: recovery master -> leader
via e60581d5b5e ctdb-tools: Use leader broadcast in get_leader()
via 92fb68e9b8a ctdb-tools: Factor out get_leader()
via 17ba15ccd88 ctdb-tools: Handle leader broadcasts in ctdb tool
via ec90f36cc61 ctdb-tools: Print "UNKNOWN" when leader PNN is unknown
via 01a8d1a4a40 ctdb-client: Factor out function ctdb_client_wait_func_timeout()
via 403db5b5288 ctdb-tests: Factor out getting leader and waiting for leader change
via 4786982cc80 ctdb-tests: Add leader broadcasts to fake_ctdbd
via 756dfdfed9f ctdb-tests: Implement srvid_handler for dispatching messages
via 958746f947d ctdb-recoverd: Simplify some stopped/banned checks to inactive checks
via 358c59f51ab ctdb-recoverd: No longer take cluster lock during recovery
via 36ffaaa691c ctdb-recoverd: Add and use function cluster_lock_enabled()
via 5ee664ee17f ctdb-recoverd: Terminology change: recovery lock -> cluster lock
via 0f2250f4f9f ctdb-recoverd: Take cluster lock when election completes
via 011e880002b ctdb-recoverd: Factor out function cluster_lock_take()
via 037abf86206 ctdb-tests: Avoid a race
via ef7e3265f76 ctdb-tests: Setup cluster with expected arguments
via b029ca4d513 ctdb-recoverd: Drop leader validation
via 7e53fab0a36 ctdb-recoverd: Drop special case for elected-before-connected
via ef4b8c13c07 ctdb-recoverd: Handle leader broadcast timeout
via 5c7f6da0f0e ctdb-recoverd: Send leader broadcasts
via 789a75abfa2 ctdb-recoverd: Process leader broadcasts
via 3d3767a259b ctdb-protocol: Add CTDB_SRVID_LEADER
via c2cfd9c21aa ctdb-recoverd: Add an explicit flag for election in progress
via ac5a3ca063f ctdb-recoverd: Only start election if node can be leader
via 7baadfe27ed ctdb-recoverd: Add and use function this_node_can_be_leader()
via 94b546c268e ctdb-recoverd: Logging/comments: recovery master -> leader
via dd79e9bd14d ctdb-recoverd: Rename recmaster field to leader
via 2ee6763c7d9 ctdb-recoverd: Use rec->pnn everywhere
via 4af3b10a378 ctdb-recoverd: Change argument to srvid_disable_and_reply()
via b7c138ca99a ctdb-recoverd: Simplify arguments to ctdb_ban_node()
via a5e0ddac626 ctdb-recoverd: Simplify arguments to verify_local_ip_allocation()
via 67b51916408 ctdb-recoverd: Simplify arguments to do_recovery()
via 57882beb16a ctdb-recoverd: Simplify arguments to some election functions
via 9dbe7cc85e4 ctdb-recoverd: Add PNN to recovery daemon context
via ff0140e4700 ctdb-recoverd: Use this_node_is_leader() in an extra context
via c8721d01c65 ctdb-recoverd: Factor out and use function this_node_is_leader()
from 57a32cebdd8 ctdb-recoverd: Pass SIGHUP to running helper
https://git.samba.org/?p=samba.git;a=shortlog;h=master
- Log -----------------------------------------------------------------
commit da2e1047f1fc9f0ac98490c79c21c427b47274d5
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 14 13:39:34 2022 +1100
WHATSNEW: Document CTDB leader and cluster lock changes
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
Autobuild-User(master): Martin Schwenke <martins at samba.org>
Autobuild-Date(master): Mon Jan 17 11:16:14 UTC 2022 on sn-devel-184
commit f7de2132bb999780331e5b005946ba5b494063c1
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jan 10 13:41:31 2022 +1100
ctdb-doc: Remove documentation for recovery process
This is many years out of date and recent changes make it worse. It
is unlikely that anyone has the time to fix this in the near future,
so remove it because it is misleading.
Database recovery steps are well documented in comments in the
recovery helper. Cluster monitoring documentation can be re-added
when things stop changing.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit a940ad9370687c97d1ccb0f934842b69c1d44c76
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jan 17 09:16:17 2022 +1100
ctdb-doc: Update example configuration migration script
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 01313ea243e4d52ea558ca4c53b6f4a1f07341e7
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 14 23:09:38 2022 +1100
ctdb-tests: Improve test coverage for leader role yield and elections
Rename test, clean up node selection. Duplicate for for banning and
removing leader capability cases. Repeat all 3 tests without cluster
lock.
All of the standard election triggers are now tested, with and without
cluster lock. Due to test cluster configuration limitations, the
tests without cluster lock are skipped on a real cluster.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 5d317781498a69c94b47ce47b60438e6cb520f96
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 14 13:59:25 2022 +1100
ctdb-tests: Support commenting out local daemons configuration options
Can be used to disable default options, such as cluster lock.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 34d2ca0ae6471c8d742b22aa4c57012232a2a832
Author: Martin Schwenke <martin at meltin.net>
Date: Sat Jan 15 13:02:02 2022 +1100
ctdb-config: Add configuration option [cluster] leader timeout
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 1dfb266038f6fdf971bb0ffe0726f778b986371d
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jan 10 14:15:25 2022 +1100
ctdb-config: [legacy] recmaster capability -> [cluster] leader capability
Rename this configuration item and move it into the [cluster]
configuration section.
Update documentation to match.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit f5a39058f0743f5607df91cb698a2b15618e1360
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jan 10 19:18:14 2022 +1100
ctdb-config: [cluster] recovery lock -> [cluster] cluster lock
Retain "recovery lock" and mark as deprecated for backward
compatibility.
Some documentation is still inconsistent.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit d752a92e1153fa355b0cbaa1f482fdc0d88e42f5
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jan 10 14:18:32 2022 +1100
ctdb-doc: Update documentation for leader and cluster lock
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 73555e8248aff683b6cb3a02262a66ab52f2c665
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Mar 18 15:14:39 2020 +1100
ctdb-recoverd: Use race for cluster lock as election when lock is enabled
If the cluster is partitioned then nodes in one partition can not take
the lock anyway, so election is pointless. It just introduces
unnecessary corner cases.
Instead just race for the lock.
When a node notices a lack of leader and notifies other nodes of an
election via an unknown leader broadcast, the cluster lock election is
hooked into this broadcast.
The test needs to be updated because losing the cluster lock can now
result in a leadership change.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 938d64c8ff3d1776c2d5959714c4c11eba7278c4
Author: Martin Schwenke <martin at meltin.net>
Date: Wed May 6 00:19:38 2020 +1000
ctdb-protocol: Mark {GET,SET}_RECMASTER controls obsolete
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 03ae158cffc3812f82365c65f8333768539f854d
Author: Martin Schwenke <martin at meltin.net>
Date: Wed May 6 00:10:22 2020 +1000
ctdb-protocol: Drop marshalling for {GET,SET}_RECMASTER controls
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit a76374070d38e2dc86067ce413bb26b8e554c0b2
Author: Martin Schwenke <martin at meltin.net>
Date: Wed May 6 00:01:05 2020 +1000
ctdb-daemon: Drop implementation of {GET,SET}_RECMASTER controls
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 193b624d26acffaa39a5fc393268f152b5809f99
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 5 23:58:38 2020 +1000
ctdb-protocol: Drop protocol client functions for recmaster controls
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit cda673ff6dc6e33e947022305859f004197a803a
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 5 23:56:10 2020 +1000
ctdb-client: Drop unused recmaster functions
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 16efbca0036ee444aecfa0a992ff733bb182b2c7
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 5 23:52:05 2020 +1000
ctdb-daemon: Drop unused old client recmaster functions
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit c68267b2a60559755835c4d56b5ba7c766155489
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 5 23:26:41 2020 +1000
ctdb-recoverd: Drop calls to ctdb_ctrl_setrecmaster()
Nothing fetches this value anymore.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 58d7fcdf7c9568a3a4b9d8e5db8b68f073409ab1
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 5 23:25:34 2020 +1000
ctdb-recoverd: Drop recovery master verification
This doesn't make sense if leader broadcasts are used.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit f02e097485722badf27523c706adb99f21342f56
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jan 10 13:22:19 2022 +1100
ctdb-tools: recovery master -> leader
The following command names are changed:
recmaster -> leader
setrecmasterrole -> setleaderrole
Command output changed for the following commands:
status
getcapabilities
Documentation and tests are updated to reflect these changes.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit e60581d5b5ecbac2b4bae49fbf60e071372fc2d3
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Mar 19 17:14:10 2020 +1100
ctdb-tools: Use leader broadcast in get_leader()
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 92fb68e9b8a5481d9dd5c9033c98e204035509fe
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Mar 19 17:30:24 2020 +1100
ctdb-tools: Factor out get_leader()
This seems pointless but it localises a subsequent change and also
starts a terminology change in the tool code.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 17ba15ccd88367dca82b0c4c8e4ff3f859896d87
Author: Martin Schwenke <martin at meltin.net>
Date: Mon May 4 17:56:22 2020 +1000
ctdb-tools: Handle leader broadcasts in ctdb tool
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit ec90f36cc6185fc6ed13164fb13ec3630aff68ad
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Mar 19 10:46:25 2020 +1100
ctdb-tools: Print "UNKNOWN" when leader PNN is unknown
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 01a8d1a4a400a3bacbe334ef0f379c03d64633d5
Author: Martin Schwenke <martin at meltin.net>
Date: Mon May 4 19:01:09 2020 +1000
ctdb-client: Factor out function ctdb_client_wait_func_timeout()
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 403db5b52882c91f35ae189bcf8f01f8180c7b50
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 14 21:47:52 2022 +1100
ctdb-tests: Factor out getting leader and waiting for leader change
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 4786982cc80f4ec0c23673a144ac179fa60bde78
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 5 23:02:03 2020 +1000
ctdb-tests: Add leader broadcasts to fake_ctdbd
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 756dfdfed9fe7d6acf2cf894d9918c8ac489571e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue May 5 16:53:39 2020 +1000
ctdb-tests: Implement srvid_handler for dispatching messages
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Martin Schwenke <martin at meltin.net>
commit 958746f947dcd499b0fe9afee21e436912739284
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Mar 17 17:10:20 2020 +1100
ctdb-recoverd: Simplify some stopped/banned checks to inactive checks
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 358c59f51ab39175ffe72afdfc4c2e0ed23b5929
Author: Martin Schwenke <martin at meltin.net>
Date: Mon May 4 17:45:51 2020 +1000
ctdb-recoverd: No longer take cluster lock during recovery
Confirm instead that it is already held.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 36ffaaa691c63896b7b92628b147b7a564421311
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Dec 10 11:43:10 2021 +1100
ctdb-recoverd: Add and use function cluster_lock_enabled()
Now all references to ctdb->recovery_lock are encapsulated in the
cluster lock code.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 5ee664ee17fa4d2fbdea2be3f4c0b1fd8f8971b1
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Dec 10 11:29:06 2021 +1100
ctdb-recoverd: Terminology change: recovery lock -> cluster lock
No functional changes, just name changes for clarity.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 0f2250f4f9f4efbf73e887538969c395c57e57be
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Sep 20 14:13:58 2018 +1000
ctdb-recoverd: Take cluster lock when election completes
It is no longer just a recovery lock but is always held by the cluster
leader.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 011e880002b8d2bc783f96e8ea5713322fcc2a93
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Sep 20 12:30:58 2018 +1000
ctdb-recoverd: Factor out function cluster_lock_take()
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 037abf862069694acd849760175be9943a6fcd3e
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Mar 17 17:58:02 2020 +1100
ctdb-tests: Avoid a race
See the comment in the code for details.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit ef7e3265f76fbfdacdd9f17f3ddfca79ce823b60
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Dec 7 17:00:36 2021 +1100
ctdb-tests: Setup cluster with expected arguments
ctdb_test_init() doesn't actually pass arguments to local_daemons.sh.
This needs to be done using ctdb_nodes_start_custom().
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit b029ca4d513163c4b0146c2a303130ae2a2581b4
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Dec 17 12:54:23 2021 +1100
ctdb-recoverd: Drop leader validation
The introduction of the leader broadcast timeout provides an
alternative to the current leader validation. Using the leader
broadcast may not be as fast but it is more correct.
When the leader node is stopped or banned, the only way of triggering
an election is currently to fetch the leader's node map to check
whether the it is still active. This is because the leader will no
longer push the node map to other nodes. However, having all nodes
fetch the node map from an inactive leader may be unreliable.
Most of the other cases are also handled more reliably by the leader
broadcast timeout.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 7e53fab0a364426a03932974727c386e750716be
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jan 6 14:47:45 2022 +1100
ctdb-recoverd: Drop special case for elected-before-connected
This no longer occurs at startup due to the leader broadcast timeout.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit ef4b8c13c0762fc5072627ee0211b3bf506f2d73
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Dec 17 14:42:47 2021 +1100
ctdb-recoverd: Handle leader broadcast timeout
If no leader broadcasts have been received from the leader for more
than 5s then trigger an election.
Apart from being sane behaviour, this avoids elected-before-connected
bugs at startup, where a node elects itself leader before it is
connected to other nodes.
When a node processes a leader broadcast timeout it sends an unknown
leader broadcast to all nodes. That causes cancellation of the leader
broadcast timeout across the cluster. This is particular important at
startup, since nodes may be started in a staggered fashion. Without
this cluster-wide cancellation, a node might notice the lack of
leader, win an election and complete a recovery before other nodes
notice the lack of leader. When the leader broadcast timeout finally
occurs on the other nodes then they'll put the cluster back into an
unnecessary recovery.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 5c7f6da0f0e6c92ae4cd338b92f475bb4a8e2cc9
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Mar 16 16:16:44 2020 +1100
ctdb-recoverd: Send leader broadcasts
These are triggered on 1 second timer, but are only sent if the node
is the current leader and there is no election underway.
If this node can not be the leader then ensure it releases the
recovery lock.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 789a75abfa2af0af39616c69575882e5db2b6f07
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Mar 16 16:07:26 2020 +1100
ctdb-recoverd: Process leader broadcasts
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 3d3767a259b29674882c102fe629cff1eb1a702c
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Mar 16 16:05:29 2020 +1100
ctdb-protocol: Add CTDB_SRVID_LEADER
CTDB_SRVID_LEADER will be regularly broadcast to all connected nodes
by the leader.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit c2cfd9c21aae6045b4ebf3ba330cbf2b9631490e
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Mar 18 20:27:10 2020 +1100
ctdb-recoverd: Add an explicit flag for election in progress
An alternate election method will be added that doesn't use the
election timeout, so this provides a common way for recognising when
an election is in progress.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit ac5a3ca063fd7435557a65866fda5fa1e0012394
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 7 11:27:06 2022 +1100
ctdb-recoverd: Only start election if node can be leader
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 7baadfe27eda40560753fb4a61e053ea357fd2d2
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Dec 14 10:57:03 2021 +1100
ctdb-recoverd: Add and use function this_node_can_be_leader()
This makes the code self-documenting.
In ctdb_election_data() there is a slight behaviour change. An
inactive node will now try to lose an election. This case should not happen
because:
* An inactive node can't win an election round and then send a reply.
* Any inactive node should never start an election. There are
currently places where this happens and they will be fixed later.
There is an instance where this could be used in
validate_recovery_master() but this involves a more serious logic
change. Overhaul this function later.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 94b546c268ee5fb4505c6febe4bce05f1d75e7cd
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Dec 8 11:07:25 2021 +1100
ctdb-recoverd: Logging/comments: recovery master -> leader
There are some remaining instances in this file but they will be
removed in subsequent commits.
Modernise debug macros as appropriate.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit dd79e9bd14dd61fc60dfaac5c9065d465336714c
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 14 15:22:33 2020 +1000
ctdb-recoverd: Rename recmaster field to leader
Recovery master is being renamed to leader. This follows clustering
best practice (e.g. RAFT).
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 2ee6763c7d9a8e347c0a98f918ad39f62222df31
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Dec 8 20:25:46 2021 +1100
ctdb-recoverd: Use rec->pnn everywhere
This is currently referenced in a number of inconsistent
ways, including:
* pnn
* rec->ctdb->pnn
* ctdb->pnn
* ctdb_get_pnn(ctdb)
* ctdb_get_pnn(rec->ctdb)
The first of these always requires some thought about the context - is
this the node PNN or some other PNN (e.g. argument to function)?
rec->pnn is now always used when referring to the recovery daemon's
PNN.
Doing this also reduces reliance on struct ctdb_context internals.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 4af3b10a378ea614f926c23570ec91334e2c6408
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Dec 8 21:28:05 2021 +1100
ctdb-recoverd: Change argument to srvid_disable_and_reply()
Reduce dependency on struct ctdb_context internals, enable a
subsequent change.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit b7c138ca99a4a839b9c30e59dff40fd2b95e13ec
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Dec 10 10:31:56 2021 +1100
ctdb-recoverd: Simplify arguments to ctdb_ban_node()
ban_time argument is always ctdb->tunable.recovery_ban_period, so
build this in and make the calling code more readable.
ctdb_ban_node() already logs how long a node is banned for, so don't
repeatedly log this.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit a5e0ddac626bc90c859949c977657cdf1fa110ac
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Dec 13 09:51:36 2021 +1100
ctdb-recoverd: Simplify arguments to verify_local_ip_allocation()
All other arguments are available via rec, so simplify.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 67b51916408831f13ca05a6c395f01824288fe8d
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jan 16 16:20:05 2018 +1100
ctdb-recoverd: Simplify arguments to do_recovery()
pnn and nodemap are both available via the rec context, so simplify.
vnnmap is unused.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 57882beb16a89d5e4081d0645549891a04ab5fb0
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Dec 8 19:27:01 2021 +1100
ctdb-recoverd: Simplify arguments to some election functions
The pnn and nodemap arguments to force_election() and
send_election_request() are always effectively rec->pnn and
rec->nodemap, so simplify.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 9dbe7cc85e41ce4f9163d8298ba9fb20052db894
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Dec 9 10:33:17 2021 +1100
ctdb-recoverd: Add PNN to recovery daemon context
This is currently referenced in a number of inconsistent
ways, including:
* pnn
* rec->ctdb->pnn
* ctdb->pnn
* ctdb_get_pnn(ctdb)
* ctdb_get_pnn(rec->ctdb)
The first of these always requires some thought about the context - is
this the node PNN or some other PNN (e.g. argument to function)?
The intention is to always use rec->pnn when referring to the recovery
daemon's PNN.
Doing this also reduces reliance on struct ctdb_context internals.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit ff0140e470016a7a2b5365c06f4d912e7a7c8af8
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Dec 9 11:47:54 2021 +1100
ctdb-recoverd: Use this_node_is_leader() in an extra context
This is arguably clearer.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit c8721d01c6547f33f51b8e26b3e1f4370ec1ecc6
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Dec 8 19:37:39 2021 +1100
ctdb-recoverd: Factor out and use function this_node_is_leader()
Make the code self-documenting.
This preempts an upcoming change to terminology but doing it now saves
a lot of churn.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
-----------------------------------------------------------------------
Summary of changes:
WHATSNEW.txt | 58 ++
ctdb/client/client.h | 22 +
ctdb/client/client_connect.c | 30 +-
ctdb/client/client_control_sync.c | 58 --
ctdb/client/client_sync.h | 10 -
ctdb/cluster/cluster_conf.c | 49 +-
ctdb/cluster/cluster_conf.h | 3 +
ctdb/config/ctdb.conf | 10 +-
ctdb/doc/cluster_mutex_helper.txt | 6 +-
ctdb/doc/ctdb-etcd.7.xml | 4 +-
ctdb/doc/ctdb.1.xml | 44 +-
ctdb/doc/ctdb.7.xml | 99 ++-
ctdb/doc/ctdb.conf.5.xml | 69 +-
ctdb/doc/ctdb_mutex_ceph_rados_helper.7.xml | 6 +-
ctdb/doc/examples/config_migrate.sh | 4 +-
ctdb/doc/examples/ctdb.conf | 2 +-
ctdb/doc/recovery-process.txt | 436 ----------
ctdb/include/ctdb_client.h | 22 -
ctdb/include/ctdb_private.h | 3 -
ctdb/protocol/protocol.h | 9 +-
ctdb/protocol/protocol_api.h | 8 -
ctdb/protocol/protocol_client.c | 46 -
ctdb/protocol/protocol_control.c | 27 -
ctdb/protocol/protocol_message.c | 12 +
ctdb/server/ctdb_client.c | 64 --
ctdb/server/ctdb_config.c | 16 +-
ctdb/server/ctdb_config.h | 4 +-
ctdb/server/ctdb_control.c | 7 +-
ctdb/server/ctdb_recover.c | 25 -
ctdb/server/ctdb_recoverd.c | 947 +++++++++++----------
ctdb/server/ctdbd.c | 12 +-
ctdb/server/legacy_conf.c | 5 -
ctdb/server/legacy_conf.h | 1 -
.../INTEGRATION/database/recovery.001.volatile.sh | 62 +-
.../INTEGRATION/database/recovery.002.large.sh | 8 +-
.../simple/cluster.001.stop_leader_yield.sh | 26 +
.../simple/cluster.002.ban_leader_yield.sh | 26 +
.../simple/cluster.002.recmaster_yield.sh | 29 -
.../simple/cluster.003.capability_leader_yield.sh | 24 +
.../cluster.006.stop_leader_yield_no_lock.sh | 30 +
.../simple/cluster.007.ban_leader_yield_no_lock.sh | 30 +
.../cluster.008.capability_leader_yield_no_lock.sh | 28 +
.../simple/cluster.015.reclock_remove_lock.sh | 26 +-
.../simple/cluster.016.reclock_move_lock_dir.sh | 18 +-
ctdb/tests/UNIT/cunit/config_test_001.sh | 4 +-
ctdb/tests/UNIT/cunit/config_test_004.sh | 72 +-
ctdb/tests/UNIT/cunit/config_test_006.sh | 5 -
ctdb/tests/UNIT/cunit/protocol_test_101.sh | 1 +
ctdb/tests/UNIT/tool/ctdb.getcapabilities.001.sh | 2 +-
ctdb/tests/UNIT/tool/ctdb.getcapabilities.002.sh | 2 +-
ctdb/tests/UNIT/tool/ctdb.getcapabilities.004.sh | 6 +-
.../{ctdb.recmaster.001.sh => ctdb.leader.001.sh} | 0
.../{ctdb.recmaster.002.sh => ctdb.leader.002.sh} | 0
ctdb/tests/UNIT/tool/ctdb.status.001.sh | 2 +-
ctdb/tests/UNIT/tool/ctdb.status.002.sh | 2 +-
ctdb/tests/local_daemons.sh | 46 +-
ctdb/tests/scripts/integration.bash | 44 +
ctdb/tests/src/fake_ctdbd.c | 138 ++-
ctdb/tests/src/protocol_common_ctdb.c | 28 +-
ctdb/tests/src/protocol_ctdb_compat_test.c | 1 +
ctdb/tests/src/protocol_ctdb_test.c | 1 +
ctdb/tools/ctdb.c | 191 ++++-
ctdb/utils/ceph/ctdb_mutex_ceph_rados_helper.c | 2 +-
ctdb/utils/ceph/test_ceph_rados_reclock.sh | 4 +-
ctdb/utils/etcd/ctdb_etcd_lock | 4 +-
65 files changed, 1494 insertions(+), 1486 deletions(-)
delete mode 100644 ctdb/doc/recovery-process.txt
create mode 100755 ctdb/tests/INTEGRATION/simple/cluster.001.stop_leader_yield.sh
create mode 100755 ctdb/tests/INTEGRATION/simple/cluster.002.ban_leader_yield.sh
delete mode 100755 ctdb/tests/INTEGRATION/simple/cluster.002.recmaster_yield.sh
create mode 100755 ctdb/tests/INTEGRATION/simple/cluster.003.capability_leader_yield.sh
create mode 100755 ctdb/tests/INTEGRATION/simple/cluster.006.stop_leader_yield_no_lock.sh
create mode 100755 ctdb/tests/INTEGRATION/simple/cluster.007.ban_leader_yield_no_lock.sh
create mode 100755 ctdb/tests/INTEGRATION/simple/cluster.008.capability_leader_yield_no_lock.sh
rename ctdb/tests/UNIT/tool/{ctdb.recmaster.001.sh => ctdb.leader.001.sh} (100%)
rename ctdb/tests/UNIT/tool/{ctdb.recmaster.002.sh => ctdb.leader.002.sh} (100%)
Changeset truncated at 500 lines:
diff --git a/WHATSNEW.txt b/WHATSNEW.txt
index c82fa5079ce..a65439c43da 100644
--- a/WHATSNEW.txt
+++ b/WHATSNEW.txt
@@ -74,6 +74,64 @@ listen on port 53. Starting with this version it is possible to configure the
port using host:port notation. See smb.conf for more details. Existing setups
are not affected, as the default port is 53.
+CTDB changes
+------------
+
+* The "recovery master" role has been renamed "leader"
+
+ Documentation and logs now refer to "leader".
+
+ The following ctdb tool command names have changed:
+
+ recmaster -> leader
+ setrecmasterrole -> setleaderrole
+
+ Command output has changed for the following commands:
+
+ status
+ getcapabilities
+
+ The "[legacy] -> recmaster capability" configuration option has been
+ renamed and moved to the cluster section, so this is now:
+
+ [cluster] -> leader capability
+
+* The "recovery lock" has been renamed "cluster lock"
+
+ Documentation and logs now refer to "cluster lock".
+
+ The "[cluster] -> recovery lock" configuration option has been
+ deprecated and will be removed in a future version. Please use
+ "[cluster] -> cluster lock" instead.
+
+ If the cluster lock is enabled then traditional elections are not
+ done and leader elections use a race for the cluster lock. This
+ avoids various conditions where a node is elected leader but can not
+ take the cluster lock. Such conditions included:
+
+ - At startup, a node elects itself leader of its own cluster before
+ connecting to other nodes
+
+ - Cluster filesystem failover is slow
+
+ The abbreviation "reclock" is still used in many places, because a
+ better abbreviation eludes us (i.e. "clock" is obvious bad) and
+ changing all instances would require a lot of churn. If the
+ abbreviation "reclock" for "cluster lock" is confusing, please
+ consider mentally prefixing it with "really excellent".
+
+* CTDB now uses leader broadcasts and an associated timeout to
+ determine if an election is required
+
+ The leader broadcast timeout can be configured via new configuration
+ option
+
+ [cluster] -> leader timeout
+
+ This specifies the number of seconds without leader broadcasts
+ before a node calls an election. The default is 5.
+
+
REMOVED FEATURES
================
diff --git a/ctdb/client/client.h b/ctdb/client/client.h
index 88ee5768d76..5f174035e28 100644
--- a/ctdb/client/client.h
+++ b/ctdb/client/client.h
@@ -170,6 +170,28 @@ uint32_t ctdb_client_pnn(struct ctdb_client_context *client);
*/
void ctdb_client_wait(struct tevent_context *ev, bool *done);
+/**
+ * @brief Client event loop waiting for function to return true with timeout
+ *
+ * This can be used to wait for asynchronous computations to complete.
+ * When this function is called, it will run tevent event loop and wait
+ * till the done function returns true or if the timeout occurs.
+ *
+ * This function will return when either
+ * - done function returns true, or
+ * - timeout has occurred.
+ *
+ * @param[in] ev Tevent context
+ * @param[in] done_func Function flag to indicate when to stop waiting
+ * @param[in] private_data Passed to done function
+ * @param[in] timeout How long to wait
+ * @return 0 on success, ETIMEDOUT on timeout, and errno on failure
+ */
+int ctdb_client_wait_func_timeout(struct tevent_context *ev,
+ bool (*done_func)(void *private_data),
+ void *private_data,
+ struct timeval timeout);
+
/**
* @brief Client event loop waiting for a flag with timeout
*
diff --git a/ctdb/client/client_connect.c b/ctdb/client/client_connect.c
index 0977d717608..a942871b1d2 100644
--- a/ctdb/client/client_connect.c
+++ b/ctdb/client/client_connect.c
@@ -336,8 +336,10 @@ static void ctdb_client_wait_timeout_handler(struct tevent_context *ev,
*timed_out = true;
}
-int ctdb_client_wait_timeout(struct tevent_context *ev, bool *done,
- struct timeval timeout)
+int ctdb_client_wait_func_timeout(struct tevent_context *ev,
+ bool (*done_func)(void *private_data),
+ void *private_data,
+ struct timeval timeout)
{
TALLOC_CTX *mem_ctx;
struct tevent_timer *timer;
@@ -356,7 +358,7 @@ int ctdb_client_wait_timeout(struct tevent_context *ev, bool *done,
return ENOMEM;
}
- while (! (*done) && ! timed_out) {
+ while (! (done_func(private_data)) && ! timed_out) {
tevent_loop_once(ev);
}
@@ -369,6 +371,28 @@ int ctdb_client_wait_timeout(struct tevent_context *ev, bool *done,
return 0;
}
+static bool client_wait_done(void *private_data)
+{
+ bool *done = (bool *)private_data;
+
+ return *done;
+}
+
+int ctdb_client_wait_timeout(struct tevent_context *ev,
+ bool *done,
+ struct timeval timeout)
+
+{
+ int ret;
+
+ ret = ctdb_client_wait_func_timeout(ev,
+ client_wait_done,
+ done,
+ timeout);
+
+ return ret;
+}
+
struct ctdb_recovery_wait_state {
struct tevent_context *ev;
struct ctdb_client_context *client;
diff --git a/ctdb/client/client_control_sync.c b/ctdb/client/client_control_sync.c
index 1459dc09b46..c786fc7dbca 100644
--- a/ctdb/client/client_control_sync.c
+++ b/ctdb/client/client_control_sync.c
@@ -615,64 +615,6 @@ int ctdb_ctrl_get_pid(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
return 0;
}
-int ctdb_ctrl_get_recmaster(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
- struct ctdb_client_context *client,
- int destnode, struct timeval timeout,
- uint32_t *recmaster)
-{
- struct ctdb_req_control request;
- struct ctdb_reply_control *reply;
- int ret;
-
- ctdb_req_control_get_recmaster(&request);
- ret = ctdb_client_control(mem_ctx, ev, client, destnode, timeout,
- &request, &reply);
- if (ret != 0) {
- DEBUG(DEBUG_ERR,
- ("Control GET_RECMASTER failed to node %u, ret=%d\n",
- destnode, ret));
- return ret;
- }
-
- ret = ctdb_reply_control_get_recmaster(reply, recmaster);
- if (ret != 0) {
- DEBUG(DEBUG_ERR,
- ("Control GET_RECMASTER failed, ret=%d\n", ret));
- return ret;
- }
-
- return 0;
-}
-
-int ctdb_ctrl_set_recmaster(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
- struct ctdb_client_context *client,
- int destnode, struct timeval timeout,
- uint32_t recmaster)
-{
- struct ctdb_req_control request;
- struct ctdb_reply_control *reply;
- int ret;
-
- ctdb_req_control_set_recmaster(&request, recmaster);
- ret = ctdb_client_control(mem_ctx, ev, client, destnode, timeout,
- &request, &reply);
- if (ret != 0) {
- DEBUG(DEBUG_ERR,
- ("Control SET_RECMASTER failed to node %u, ret=%d\n",
- destnode, ret));
- return ret;
- }
-
- ret = ctdb_reply_control_set_recmaster(reply);
- if (ret != 0) {
- DEBUG(DEBUG_ERR,
- ("Control SET_RECMASTER failed, ret=%d\n", ret));
- return ret;
- }
-
- return 0;
-}
-
int ctdb_ctrl_freeze(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
struct ctdb_client_context *client,
int destnode, struct timeval timeout,
diff --git a/ctdb/client/client_sync.h b/ctdb/client/client_sync.h
index b8f5d905857..5b0ff42e95d 100644
--- a/ctdb/client/client_sync.h
+++ b/ctdb/client/client_sync.h
@@ -124,16 +124,6 @@ int ctdb_ctrl_get_pid(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
int destnode, struct timeval timeout,
pid_t *pid);
-int ctdb_ctrl_get_recmaster(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
- struct ctdb_client_context *client,
- int destnode, struct timeval timeout,
- uint32_t *recmaster);
-
-int ctdb_ctrl_set_recmaster(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
- struct ctdb_client_context *client,
- int destnode, struct timeval timeout,
- uint32_t recmaster);
-
int ctdb_ctrl_freeze(TALLOC_CTX *mem_ctx, struct tevent_context *ev,
struct ctdb_client_context *client,
int destnode, struct timeval timeout,
diff --git a/ctdb/cluster/cluster_conf.c b/ctdb/cluster/cluster_conf.c
index be79d5942a8..bdd64ba112f 100644
--- a/ctdb/cluster/cluster_conf.c
+++ b/ctdb/cluster/cluster_conf.c
@@ -113,6 +113,38 @@ good:
mode);
}
+static bool validate_recovery_lock(const char *key,
+ const char *old_reclock,
+ const char *new_reclock,
+ enum conf_update_mode mode)
+{
+ bool status;
+
+ if (new_reclock != NULL) {
+ D_WARNING("Configuration option [%s] -> %s is deprecated\n",
+ CLUSTER_CONF_SECTION,
+ key);
+ }
+
+ status = check_static_string_change(key, old_reclock, new_reclock, mode);
+
+ return status;
+}
+
+static bool validate_leader_timeout(const char *key,
+ int old_timeout,
+ int new_timeout,
+ enum conf_update_mode mode)
+{
+ if (new_timeout <= 0) {
+ D_ERR("Invalid value for [cluster] -> leader timeout = %d\n",
+ new_timeout);
+ return false;
+ }
+
+ return true;
+}
+
void cluster_conf_init(struct conf_context *conf)
{
conf_define_section(conf, CLUSTER_CONF_SECTION, NULL);
@@ -129,7 +161,22 @@ void cluster_conf_init(struct conf_context *conf)
validate_node_address);
conf_define_string(conf,
CLUSTER_CONF_SECTION,
- CLUSTER_CONF_RECOVERY_LOCK,
+ CLUSTER_CONF_CLUSTER_LOCK,
NULL,
check_static_string_change);
+ conf_define_string(conf,
+ CLUSTER_CONF_SECTION,
+ CLUSTER_CONF_RECOVERY_LOCK,
+ NULL,
+ validate_recovery_lock);
+ conf_define_integer(conf,
+ CLUSTER_CONF_SECTION,
+ CLUSTER_CONF_LEADER_TIMEOUT,
+ 5,
+ validate_leader_timeout);
+ conf_define_boolean(conf,
+ CLUSTER_CONF_SECTION,
+ CLUSTER_CONF_LEADER_CAPABILITY,
+ true,
+ NULL);
}
diff --git a/ctdb/cluster/cluster_conf.h b/ctdb/cluster/cluster_conf.h
index 6b797ef1085..38c378fd571 100644
--- a/ctdb/cluster/cluster_conf.h
+++ b/ctdb/cluster/cluster_conf.h
@@ -26,7 +26,10 @@
#define CLUSTER_CONF_TRANSPORT "transport"
#define CLUSTER_CONF_NODE_ADDRESS "node address"
+#define CLUSTER_CONF_CLUSTER_LOCK "cluster lock"
#define CLUSTER_CONF_RECOVERY_LOCK "recovery lock"
+#define CLUSTER_CONF_LEADER_TIMEOUT "leader timeout"
+#define CLUSTER_CONF_LEADER_CAPABILITY "leader capability"
void cluster_conf_init(struct conf_context *conf);
diff --git a/ctdb/config/ctdb.conf b/ctdb/config/ctdb.conf
index 5440600a435..8e1b3760973 100644
--- a/ctdb/config/ctdb.conf
+++ b/ctdb/config/ctdb.conf
@@ -11,12 +11,12 @@
# log level = NOTICE
[cluster]
- # Shared recovery lock file to avoid split brain. Daemon
- # default is no recovery lock. Do NOT run CTDB without a
- # recovery lock file unless you know exactly what you are
+ # Shared cluster lock file to avoid split brain. Daemon
+ # default is no cluster lock. Do NOT run CTDB without a
+ # cluster lock file unless you know exactly what you are
# doing.
#
- # Please see the RECOVERY LOCK section in ctdb(7) for more
+ # Please see the CLUSTER LOCK section in ctdb(7) for more
# details.
#
- # recovery lock = !/bin/false RECOVERY LOCK NOT CONFIGURED
+ # cluster lock = !/bin/false CLUSTER LOCK NOT CONFIGURED
diff --git a/ctdb/doc/cluster_mutex_helper.txt b/ctdb/doc/cluster_mutex_helper.txt
index 20c8eb2b51d..4ee018ffc94 100644
--- a/ctdb/doc/cluster_mutex_helper.txt
+++ b/ctdb/doc/cluster_mutex_helper.txt
@@ -5,11 +5,11 @@ CTDB uses cluster-wide mutexes to protect against a "split brain",
which could occur if the cluster becomes partitioned due to network
failure or similar.
-CTDB uses a cluster-wide mutex for its "recovery lock", which is used
+CTDB uses a cluster-wide mutex for its "cluster lock", which is used
to ensure that only one database recovery can happen at a time. For
-an overview of recovery lock configuration see the RECOVERY LOCK
+an overview of cluster lock configuration see the CLUSTER LOCK
section in ctdb(7). CTDB tries to ensure correct operation of the
-recovery lock by attempting to take the recovery lock when CTDB knows
+cluster lock by attempting to take the cluster lock when CTDB knows
that it should already be held.
By default, CTDB uses a supplied mutex helper that uses a fcntl(2)
diff --git a/ctdb/doc/ctdb-etcd.7.xml b/ctdb/doc/ctdb-etcd.7.xml
index 5d7a0e05366..f84989f854f 100644
--- a/ctdb/doc/ctdb-etcd.7.xml
+++ b/ctdb/doc/ctdb-etcd.7.xml
@@ -60,7 +60,7 @@
<para>
ctdb_etcd_lock is intended to be run as a mutex helper for CTDB. It
will try to connect to an existing etcd cluster and grab a lock in that
- cluster to function as CTDB's recovery lock. Please see
+ cluster to function as CTDB's cluster lock. Please see
<emphasis>ctdb/doc/cluster_mutex_helper.txt</emphasis> for details on
the mutex helper API. To use this, include the following line in
the <literal>[cluster]</literal> section of
@@ -68,7 +68,7 @@
<manvolnum>5</manvolnum></citerefentry>:
</para>
<screen format="linespecific">
-recovery lock = !/usr/local/usr/libexec/ctdb/ctdb_etcd_lock
+cluster lock = !/usr/local/usr/libexec/ctdb/ctdb_etcd_lock
</screen>
<para>
You can also pass "-v", "-vv", or "-vvv" to include verbose output in
diff --git a/ctdb/doc/ctdb.1.xml b/ctdb/doc/ctdb.1.xml
index e0e05d8e542..6f9a1764ee4 100644
--- a/ctdb/doc/ctdb.1.xml
+++ b/ctdb/doc/ctdb.1.xml
@@ -299,10 +299,10 @@
RECOVERY - The cluster databases have all been frozen, pausing all services while the cluster awaits a recovery process to complete. A recovery process should finish within seconds. If a cluster is stuck in the RECOVERY state this would indicate a cluster malfunction which needs to be investigated.
</para>
<para>
- Once the recovery master detects an inconsistency, for example a node
+ Once the leader detects an inconsistency, for example a node
becomes disconnected/connected, the recovery daemon will trigger a
cluster recovery process, where all databases are remerged across the
- cluster. When this process starts, the recovery master will first
+ cluster. When this process starts, the leader will first
"freeze" all databases to prevent applications such as samba from
accessing the databases and it will also mark the recovery mode as
RECOVERY.
@@ -316,13 +316,16 @@
</para>
</refsect3>
<refsect3>
- <title>Recovery master</title>
+ <title>Leader</title>
<para>
- This is the cluster node that is currently designated as the recovery master. This node is responsible of monitoring the consistency of the cluster and to perform the actual recovery process when reqired.
+ This is the cluster node that is currently designated as the
+ leader. This node is responsible of monitoring the
+ consistency of the cluster and to perform the actual
+ recovery process when reqired.
</para>
<para>
- Only one node at a time can be the designated recovery master. Which
- node is designated the recovery master is decided by an election
+ Only one node at a time can be the designated leader. Which
+ node is designated the leader is decided by an election
process in the recovery daemons running on each node.
</para>
</refsect3>
@@ -343,7 +346,7 @@ hash:1 lmaster:1
hash:2 lmaster:2
hash:3 lmaster:3
Recovery mode:NORMAL (0)
-Recovery master:0
+Leader:0
</screen>
</refsect3>
</refsect2>
@@ -397,9 +400,9 @@ pnn:1 10.0.0.31 OK
</refsect2>
<refsect2>
- <title>recmaster</title>
+ <title>leader</title>
<para>
- This command shows the pnn of the node which is currently the recmaster.
+ This command shows the pnn of the node which is currently the leader.
</para>
<para>
@@ -939,7 +942,7 @@ pnn:3 10.0.0.14 OK
Example output:
</para>
<screen>
-RECMASTER: YES
+LEADER: YES
LMASTER: YES
</screen>
@@ -1217,13 +1220,20 @@ DB Statistics: locking.tdb
</refsect2>
<refsect2>
- <title>setrecmasterrole on|off</title>
+ <title>setleaderrole on|off</title>
<para>
- This command is used to enable/disable the RECMASTER capability for a node at runtime. This capability determines whether or not a node can be used as an RECMASTER for the cluster. A node that does not have the RECMASTER capability can not win a recmaster election. A node that already is the recmaster for the cluster when the capability is stripped off the node will remain the recmaster until the next cluster election.
+ This command is used to enable/disable the LEADER capability
+ for a node at runtime. This capability determines whether or
+ not a node can be elected leader of the cluster. A node that
+ does not have the LEADER capability can not be elected
+ leader. If the current leader has this capability removed then
+ an election will occur.
</para>
<para>
- Nodes will by default have this capability, but it can be stripped off nodes by the setting in the sysconfig file or by using this command.
+ Nodes have this capability enabled by default, but it can be
+ removed via the <command>cluster:leader capability</command>
+ configuration setting or by using this command.
</para>
<para>
See also "ctdb getcapabilities"
@@ -1740,7 +1750,13 @@ HEALTH: NO-HEALTHY-NODES - ERROR - Backup of corrupted TDB in '/usr/local/var/li
<refsect2>
<title>ipreallocate, sync</title>
<para>
- This command will force the recovery master to perform a full ip reallocation process and redistribute all ip addresses. This is useful to "reset" the allocations back to its default state if they have been changed using the "moveip" command. While a "recover" will also perform this reallocation, a recovery is much more hevyweight since it will also rebuild all the databases.
+ This command will force the leader to perform a full ip
+ reallocation process and redistribute all ip addresses. This
+ is useful to "reset" the allocations back to its default state
+ if they have been changed using the "moveip" command. While a
+ "recover" will also perform this reallocation, a recovery is
+ much more hevyweight since it will also rebuild all the
+ databases.
</para>
</refsect2>
diff --git a/ctdb/doc/ctdb.7.xml b/ctdb/doc/ctdb.7.xml
--
Samba Shared Repository
More information about the samba-cvs
mailing list