NLM and CTDB recovery master node failure

Thu Oct 29 08:11:01 MDT 2009

> -----Original Message-----
> From: Volker Lendecke [mailto:Volker.Lendecke at SerNet.DE]
> Sent: Thursday, October 29, 2009 11:21
> To: Sergey Kleyman
> Cc: samba-technical at lists.samba.org
> Subject: Re: NLM and CTDB recovery master node failure
> 
> On Thu, Oct 29, 2009 at 10:41:01AM +0200, Sergey Kleyman wrote:
> > I'm trying to implement clustered Samba on my cluster file system by
> > using Samba+CTDB (version 3.4.2). I noticed on CTDB wiki page
> > (http://wiki.samba.org/index.php/CTDB_Project) the following
> sentence:
> >
> > "To become a recovery master, a node must be able to acquire an
> > exclusive lock on that file."
> >
> > So I am wondering how CTDB deals with recovery master failure. What
> > happens if the node, CTDB recovery master is running on, has
hardware
> > failure and doesn't come up for a very long time (or even never)?
NLM
> > server of the underlying clustered file system will hold the lock
> > until the client comes back up which might never happen so remaining
> > nodes will not be able to select a new leader because none of them
> > will be able to acquire an exclusive lock. Am I missing something?
> 
> So you're saying that a node takes a lock, the node dies and until
that
> node comes back up, nobody will be able to take that lock? Our
> assumption so far is that shared fcntl locks behave like local fcntl
> locks: If a process that holds a lock dies, then the lock is released.
> It should not matter for what reason that process dies. A node being
> killed is a particularly nasty death for a process, but the lock must
> nevertheless be released.
> 
> You *can* run ctdb without that shared lock. But the shared lock was
> there for a reason: We need to make sure that we have the same view of
> cluster membership as the cluster fs below has.
> 
> You should look at
> 
> ctdb setvar VerifyRecoveryLock 0
> 
> to work without a recovery lock. But be aware that this is NOT
> recommended.
> 
> Volker

Thanks for the reply but allow me to disagree about "shared fcntl locks
behave like local fcntl locks"

According to this
http://www.opengroup.org/onlinepubs/009629799/chap9.htm#tagcjh_10
"Client Failure and Restart"

"... the client NSM issues an SM_NOTIFY RPC to the NSM on the named
host. In this example it will issue an SM_NOTIFY to the server NSM,
including the client name and the new client state... The callback
procedure in the server NLM notes that the client state has changed and
releases all locks held on behalf of the client."

So NLM server releases locks only when notified by client (in our case
NLM client in Linux kernel) but obviously this happens only when the
node that was holding the lock comes back up. So the problem is that NLM
server doesn't have an ability to distinguish between failed client and
client that holds a lock for a very long time. There's no proactive
heartbeat as CTDB has. The document even says so explicitly (section
"NSM Protocol")

"... The NSM does not actively "probe" hosts it has been asked to
monitor; instead it waits for the monitored host to notify it that the
monitored host's status has changed (that is, crashed and rebooted). "

It's not the case for the kernel which can easily distinguish between
process that died (and so it should have all its locks automatically
released) and process that is still running and holding a lock. Please
correct me if I'm wrong.

As for your advice about running CTDB without a recovery lock I would
obviously prefer to use recommended configuration but I wonder what
functionality will suffer from this choice?

Thanks Sergey