CTDB and Recovery lock
martin at meltin.net
Tue Nov 17 21:04:58 UTC 2015
On Tue, 17 Nov 2015 11:14:44 -0800, Partha Sarathi
<parthasarathi.bl at gmail.com> wrote:
> I was going through the recovery process details given in the
> Recovery-process.txt and got few questions on the below points listed in it.
> RECOVERY MASTER CLUSTER MONITORING
> 17, Verify that the filehandle to the recovery lock file is valid. If it is
> not, this may mean a split brain and is a critical error.
> Try a new recovery and restart monitoring from 1.
> "recovery master doesn't have the recovery lock"
> 18, Verify that GPFS allows us to read from the recovery lock file. If not
> there is a critical GPFS issue and we may have a split brain.
> Try forcing a new recovery and restart monitoring from 1.
> "failed read from recovery_lock_fd - %s"
> CLUSTER RECOVERY
> 3, Verify that the recovery daemon can lock the recovery lock file. At this
> stage this should be recovery master.
> If this operation fails it means we have a split brain and have to abort
> recovery. "("ctdb_recovery_lock: Unable to open %s - (%s)"
> "ctdb_recovery_lock: Failed to get recovery lock on '%s'"
> "Unable to get recovery lock - aborting recovery" "ctdb_recovery_lock:
> Got recovery lock on '%s'"
I'll reorder them a bit to answer common things together.
> 1) Does recovery master always holds the exclusive lock on the
> recovery_lock_file or it only take out exclusive lock upon recovery
> initiated by local recoverd of recmaster ?
At the moment the lock is a combination master+recovery lock.
It is taken by a new recovery master at the beginning of its first
> 3) When recmaster releases the exclusive lock on recovery lock file
It is released if the node loses an election.
In future we we might use 2 locks with clearer semantics. One would be
the master lock - it would be taken when an election is won and would
be released when an election is lost. The other would be the recovery
lock - it would be taken at the beginning of recovery and released at
the end. This involves untangling the election and recovery process,
which I have started doing in recent weeks.
> 2) After verifying the recovery lock file is valid, why are we reading
> from lock file ? I didn't understand the intension behind it.
We don't do this anymore and we need to update the documentation. If I
remember correctly, the node number of the recovery master node was
written into the lock file so that it could be verified. Checking this
often caused problems when the cluster filesystem performed badly.
> 4) If the recmaster holding a exclusive lock on a recovery lock file and
> which disconnected for some reason without updating the flags
> DISCONNECTED/UNHEALTHY , how do the other nodes get to know its status and
> starts the force election for remaster role.
Nodes will notice when they receive no packets from a node and they
will mark it as disconnected. If no other packets are sent between
nodes then keepalives are sent.
Each recovery daemon checks the status of the current master
approximately once a second. If the current master is inactive
(disconnected, stopped, banned) then an election will be called.
I'm not sure if you're concerned about the following, but...
If CTDB's private network becomes partitioned, with a single node
unable to communicate with the others, then if that single node is
holding the recovery lock then there is no sane way of repairing the
situation. The remaining nodes will hold elections and when each
winner is unable to take the lock then it will ban itself... until all
nodes in the larger partition are banned.
This is a deficiency of the exclusive lock approach compared with a
> Also we don't have our Clustered Filesystem with Posix locking support but
> we do have Consensus/Conspiracy service available in cluster, is it
> possible to make use of that infrastructure to mimic the recovery file lock
> mechanism ?
Not today. Hopefully soon. As I untangle some of this I will make the
"exclusive" lock code call out to a configurable helper program.
peace & happiness,
More information about the samba-technical