RAFT and CTDB

Chan Min Wai dcmwai at gmail.com
Thu Nov 20 19:08:53 MST 2014


Dear Martin,

Since we have touch the lock. 
I've some experience with it where I'd lock are define.

I point the lock to the shared ocfs2 cluster. 

CTDB Will not start and kept on asking for lock. 

Which is something I'm not sure. 

I follow this guide.
http://linuxcostablanca.blogspot.com/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html?m=1

The different is that my ocfs2 are shared storage between the 2 node and thus no Drbd. 

Does the lock really work on this scenario?

Thank you. 

Ps sorry to cut in as such. 

Regards,
Min Wai, Chan



> Martin Schwenke <martin at meltin.net> 於 2014年11月21日 上午8:04 寫道:
> 
> On Thu, 20 Nov 2014 15:55:39 -0800, Richard Sharpe
> <realrichardsharpe at gmail.com> wrote:
> 
>>> On Thu, Nov 20, 2014 at 3:41 PM, Martin Schwenke <martin at meltin.net> wrote:
>>> On Thu, 20 Nov 2014 15:24:39 -0800, Richard Sharpe
>>> <realrichardsharpe at gmail.com> wrote:
>>> 
>>>> Hmmm, so the essential abstraction here is that any node that is no
>>>> longer a member of the cluster (because it can't get a lock on that
>>>> file) cannot try to run recovery. Ie, in ctdb_recovery_lock we try to
>>>> open the recovery lock file and then take out a lock on it.
>>>> 
>>>> The first should/will fail if we are no longer a member of the cluster
>>>> and the second will fail if the cluster properly supports fcntl locks
>>>> but another recovery daemon has already locked the file ...
>>> 
>>> No, only the recovery master can hold the recovery lock.  Other nodes
>>> would not be able to take the lock but they are still cluster members.
>> 
>> Isn't that what I said? When I said cluster above I was referring to a
>> GPFS cluster.
> 
> CTDB has its own independent notion of cluster membership and I thought
> you were referring to that.  I didn't notice you mentioning GPFS.  :-)
> 
>>> Cluster membership is defined by being connected to the node that is
>>> currently the recovery master.  That is, nodes that the recovery master
>>> knows about (i.e. connected) and are active (i.e. not stopped or
>>> banned) will take part in recovery.
>> 
>> OK, that is a wrinkle I had not thought of. What if they have lost
>> connection to the GPFS cluster but are still talking to the recovery
>> master?
> 
> Then you would hope that they can't take the recovery lock.  ;-)
> 
> If a node in a break-away cluster (i.e. lost CTDB connection with
> main cluster - perhaps just 1 node) wins an election then it will try to
> become recovery master.  When it tries to take the recovery lock and
> fails it will ban itself.  Rinse and repeat for other nodes in the
> break-away cluster.
> 
> So, provided nodes in a break-away cluster can't take the recovery lock
> then they will all get banned and can do no harm.
> 
> If such nodes can still take the recovery lock after being expelled
> from the GPFS cluster then you should probably have the appropriate GPFS
> callback shutdown CTDB.  Depending on the CTDB configuration, this will
> probably take down Samba and other services, preventing any issues.
> 
> peace & happiness,
> martin


More information about the samba-technical mailing list