RAFT and CTDB
steve at steve-ss.com
Sun Nov 23 12:50:07 MST 2014
On 21/11/14 03:08, Chan Min Wai wrote:
> Dear Martin,
> Since we have touch the lock.
> I've some experience with it where I'd lock are define.
> I point the lock to the shared ocfs2 cluster.
> CTDB Will not start and kept on asking for lock.
> Which is something I'm not sure.
> I follow this guide.
> The different is that my ocfs2 are shared storage between the 2 node and thus no Drbd.
> Does the lock really work on this scenario?
> Thank you.
> Ps sorry to cut in as such.
> Min Wai, Chan
>> Martin Schwenke <martin at meltin.net> 於 2014年11月21日 上午8:04 寫道：
>> On Thu, 20 Nov 2014 15:55:39 -0800, Richard Sharpe
>> <realrichardsharpe at gmail.com> wrote:
>>>> On Thu, Nov 20, 2014 at 3:41 PM, Martin Schwenke <martin at meltin.net> wrote:
>>>> On Thu, 20 Nov 2014 15:24:39 -0800, Richard Sharpe
>>>> <realrichardsharpe at gmail.com> wrote:
>>>>> Hmmm, so the essential abstraction here is that any node that is no
>>>>> longer a member of the cluster (because it can't get a lock on that
>>>>> file) cannot try to run recovery. Ie, in ctdb_recovery_lock we try to
>>>>> open the recovery lock file and then take out a lock on it.
>>>>> The first should/will fail if we are no longer a member of the cluster
>>>>> and the second will fail if the cluster properly supports fcntl locks
>>>>> but another recovery daemon has already locked the file ...
>>>> No, only the recovery master can hold the recovery lock. Other nodes
>>>> would not be able to take the lock but they are still cluster members.
>>> Isn't that what I said? When I said cluster above I was referring to a
>>> GPFS cluster.
>> CTDB has its own independent notion of cluster membership and I thought
>> you were referring to that. I didn't notice you mentioning GPFS. :-)
>>>> Cluster membership is defined by being connected to the node that is
>>>> currently the recovery master. That is, nodes that the recovery master
>>>> knows about (i.e. connected) and are active (i.e. not stopped or
>>>> banned) will take part in recovery.
>>> OK, that is a wrinkle I had not thought of. What if they have lost
>>> connection to the GPFS cluster but are still talking to the recovery
>> Then you would hope that they can't take the recovery lock. ;-)
>> If a node in a break-away cluster (i.e. lost CTDB connection with
>> main cluster - perhaps just 1 node) wins an election then it will try to
>> become recovery master. When it tries to take the recovery lock and
>> fails it will ban itself. Rinse and repeat for other nodes in the
>> break-away cluster.
>> So, provided nodes in a break-away cluster can't take the recovery lock
>> then they will all get banned and can do no harm.
>> If such nodes can still take the recovery lock after being expelled
>> from the GPFS cluster then you should probably have the appropriate GPFS
>> callback shutdown CTDB. Depending on the CTDB configuration, this will
>> probably take down Samba and other services, preventing any issues.
>> peace & happiness,
@Chan: Please see the thread: 'Re: posix locking on OCFS2'
We are being asked for information to solve the lock problem:)
You will most likely be able to supply:
- precise versions of software used (file system, ctdb, ...)
- exact description of what fails
- configuration (ctdb, file system, ...)
- logs (ctdb, syslog/file system ...)
More information about the samba-technical