RAFT and CTDB
realrichardsharpe at gmail.com
Thu Nov 20 16:55:39 MST 2014
On Thu, Nov 20, 2014 at 3:41 PM, Martin Schwenke <martin at meltin.net> wrote:
> On Thu, 20 Nov 2014 15:24:39 -0800, Richard Sharpe
> <realrichardsharpe at gmail.com> wrote:
>> Hmmm, so the essential abstraction here is that any node that is no
>> longer a member of the cluster (because it can't get a lock on that
>> file) cannot try to run recovery. Ie, in ctdb_recovery_lock we try to
>> open the recovery lock file and then take out a lock on it.
>> The first should/will fail if we are no longer a member of the cluster
>> and the second will fail if the cluster properly supports fcntl locks
>> but another recovery daemon has already locked the file ...
> No, only the recovery master can hold the recovery lock. Other nodes
> would not be able to take the lock but they are still cluster members.
Isn't that what I said? When I said cluster above I was referring to a
> Cluster membership is defined by being connected to the node that is
> currently the recovery master. That is, nodes that the recovery master
> knows about (i.e. connected) and are active (i.e. not stopped or
> banned) will take part in recovery.
OK, that is a wrinkle I had not thought of. What if they have lost
connection to the GPFS cluster but are still talking to the recovery
> If a node becomes disconnected then it will try to become the recovery
> master of its own cluster. If it can take the recovery lock then it is
> allowed to do that.
> So the recovery lock simply helps to stop a split brain where there are
> multiple independent clusters operating independently. Each would have
> a different cluster database so would have inconsistent ideas of, for
> example, locking.tdb... and this can obviously lead to file data
> peace & happiness,
More information about the samba-technical