RAFT and CTDB

steve steve at steve-ss.com
Sun Nov 23 12:50:07 MST 2014


On 21/11/14 03:08, Chan Min Wai wrote:
> Dear Martin,
> 
> Since we have touch the lock.
> I've some experience with it where I'd lock are define.
> 
> I point the lock to the shared ocfs2 cluster.
> 
> CTDB Will not start and kept on asking for lock.
> 
> Which is something I'm not sure.
> 
> I follow this guide.
> http://linuxcostablanca.blogspot.com/2014/07/samba4-cluster-for-ad-drbd-ocfs2-ctdb.html?m=1
> 
> The different is that my ocfs2 are shared storage between the 2 node and thus no Drbd.
> 
> Does the lock really work on this scenario?
> 
> Thank you.
> 
> Ps sorry to cut in as such.
> 
> Regards,
> Min Wai, Chan
> 
> 
> 
>> Martin Schwenke <martin at meltin.net> 於 2014年11月21日 上午8:04 寫道:
>>
>> On Thu, 20 Nov 2014 15:55:39 -0800, Richard Sharpe
>> <realrichardsharpe at gmail.com> wrote:
>>
>>>> On Thu, Nov 20, 2014 at 3:41 PM, Martin Schwenke <martin at meltin.net> wrote:
>>>> On Thu, 20 Nov 2014 15:24:39 -0800, Richard Sharpe
>>>> <realrichardsharpe at gmail.com> wrote:
>>>>
>>>>> Hmmm, so the essential abstraction here is that any node that is no
>>>>> longer a member of the cluster (because it can't get a lock on that
>>>>> file) cannot try to run recovery. Ie, in ctdb_recovery_lock we try to
>>>>> open the recovery lock file and then take out a lock on it.
>>>>>
>>>>> The first should/will fail if we are no longer a member of the cluster
>>>>> and the second will fail if the cluster properly supports fcntl locks
>>>>> but another recovery daemon has already locked the file ...
>>>>
>>>> No, only the recovery master can hold the recovery lock.  Other nodes
>>>> would not be able to take the lock but they are still cluster members.
>>>
>>> Isn't that what I said? When I said cluster above I was referring to a
>>> GPFS cluster.
>>
>> CTDB has its own independent notion of cluster membership and I thought
>> you were referring to that.  I didn't notice you mentioning GPFS.  :-)
>>
>>>> Cluster membership is defined by being connected to the node that is
>>>> currently the recovery master.  That is, nodes that the recovery master
>>>> knows about (i.e. connected) and are active (i.e. not stopped or
>>>> banned) will take part in recovery.
>>>
>>> OK, that is a wrinkle I had not thought of. What if they have lost
>>> connection to the GPFS cluster but are still talking to the recovery
>>> master?
>>
>> Then you would hope that they can't take the recovery lock.  ;-)
>>
>> If a node in a break-away cluster (i.e. lost CTDB connection with
>> main cluster - perhaps just 1 node) wins an election then it will try to
>> become recovery master.  When it tries to take the recovery lock and
>> fails it will ban itself.  Rinse and repeat for other nodes in the
>> break-away cluster.
>>
>> So, provided nodes in a break-away cluster can't take the recovery lock
>> then they will all get banned and can do no harm.
>>
>> If such nodes can still take the recovery lock after being expelled
>> from the GPFS cluster then you should probably have the appropriate GPFS
>> callback shutdown CTDB.  Depending on the CTDB configuration, this will
>> probably take down Samba and other services, preventing any issues.
>>
>> peace & happiness,
>> martin

@Chan: Please see the thread: 'Re: posix locking on OCFS2'
We are being asked for information to solve the lock problem:)
You will most likely be able to supply:

- precise versions of software used (file system, ctdb, ...)
- exact description of what fails
- configuration (ctdb, file system, ...)
- logs (ctdb, syslog/file system ...)

Cheers,
Steve



More information about the samba-technical mailing list