NLM and CTDB recovery master node failure

Thu Oct 29 09:34:14 MDT 2009

On Thu, Oct 29, 2009 at 04:11:01PM +0200, Sergey Kleyman wrote:
> Thanks for the reply but allow me to disagree about "shared fcntl locks
> behave like local fcntl locks"
> 
> According to this
> http://www.opengroup.org/onlinepubs/009629799/chap9.htm#tagcjh_10
> "Client Failure and Restart"
> 
> "... the client NSM issues an SM_NOTIFY RPC to the NSM on the named
> host. In this example it will issue an SM_NOTIFY to the server NSM,
> including the client name and the new client state... The callback
> procedure in the server NLM notes that the client state has changed and
> releases all locks held on behalf of the client."
> 
> So NLM server releases locks only when notified by client (in our case
> NLM client in Linux kernel) but obviously this happens only when the
> node that was holding the lock comes back up. So the problem is that NLM
> server doesn't have an ability to distinguish between failed client and
> client that holds a lock for a very long time. There's no proactive
> heartbeat as CTDB has. The document even says so explicitly (section
> "NSM Protocol")

Ok, this is your implementation choice. The behaviour we
expect is different. We view the cluster not as a group of
NFS clients whose servers have to adhere to that standard
behaviour. In fact, in Samba we definitely do not support
re-exporting NFS imports, problems with locking being the
main reason for this.

Please use a different cluster file system that does not
exhibit this behaviour or run without the central
reclockfile.

Volker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20091029/9c85d78e/attachment.pgp>