NLM and CTDB recovery master node failure

Volker Lendecke Volker.Lendecke at SerNet.DE
Thu Oct 29 03:21:27 MDT 2009

On Thu, Oct 29, 2009 at 10:41:01AM +0200, Sergey Kleyman wrote:
> I'm trying to implement clustered Samba on my cluster file system by
> using Samba+CTDB (version 3.4.2). I noticed on CTDB wiki page
> ( the following sentence:
> "To become a recovery master, a node must be able to acquire an
> exclusive lock on that file."
> So I am wondering how CTDB deals with recovery master failure. What
> happens if the node, CTDB recovery master is running on, has hardware
> failure and doesn't come up for a very long time (or even never)? NLM
> server of the underlying clustered file system will hold the lock until
> the client comes back up which might never happen so remaining nodes
> will not be able to select a new leader because none of them will be
> able to acquire an exclusive lock. Am I missing something?

So you're saying that a node takes a lock, the node dies and
until that node comes back up, nobody will be able to take
that lock? Our assumption so far is that shared fcntl locks
behave like local fcntl locks: If a process that holds a
lock dies, then the lock is released. It should not matter
for what reason that process dies. A node being killed is a
particularly nasty death for a process, but the lock must
nevertheless be released.

You *can* run ctdb without that shared lock. But the shared
lock was there for a reason: We need to make sure that we
have the same view of cluster membership as the cluster fs
below has.

You should look at

ctdb setvar VerifyRecoveryLock 0

to work without a recovery lock. But be aware that this is
NOT recommended.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <>

More information about the samba-technical mailing list