NLM and CTDB recovery master node failure

ronnie sahlberg ronniesahlberg at gmail.com
Thu Oct 29 16:46:36 MDT 2009


On Fri, Oct 30, 2009 at 6:20 AM, Sergey Kleyman
<Sergey.Kleyman at exanet.com> wrote:
>> -----Original Message-----
>> From: Volker Lendecke [mailto:Volker.Lendecke at SerNet.DE]
>> Sent: Thursday, October 29, 2009 17:48
>> To: Sergey Kleyman
>> Cc: samba-technical at lists.samba.org
>> Subject: Re: NLM and CTDB recovery master node failure
>>
>> On Thu, Oct 29, 2009 at 04:34:14PM +0100, Volker Lendecke wrote:
>> > Please use a different cluster file system that does not exhibit
> this
>> > behaviour or run without the central reclockfile.
>>
>> Ok, I've got a question: Can we achieve the same result we use the
>> fcntl lock on the reclockfile for with another API on your system?
>>
>> We need to very quickly determine correct cluster membership of all
>> ctdb nodes: If nobody can get the reclock lock, then we're broken. If
>> more than one can get it, we've got a split brain. How can we get that
>> info reliably out of your cluster fs without using the fcntl lock?
>>
>> Volker
>
> We have our internal API that are implemented on top of Spread Toolkit
> (http://www.spread.org/) but our goal is to make as less changes to
> Samba as possible so changing election code to use our API is not the
> optimal solution. I guess it'll be easier to adhere to Samba's
> assumptions about NLM and provide automatic lock clean-up in case of the
> node failure. Are you sure that GPFS and/or GFS have this capability?

Yes. Locks and open files need to be recovered by the cluster
filesystem very promptly anyway
since if an i/o is blocked for 40 seconds or more, you are very likely
causing the redirector to timeout
with data corruption as a result.


>
> As a side note: if I understand you correctly CTDB is assumed to be
> running on the same machines as underlying file system. I was under the
> impression that it's possible to run file system on machines A and B,
> while Samba+CTDB will run on different machines C and D that will see
> clustered file system through NFS mounts in which case C and D are just
> NLM clients to the file system.

Do not re-export nfs, bad things happens, which is why knfsd for
example refuses to re-export nfs shares.
Also, do not use NFS for locking, or to store the reclock file.
NFS file locking in v2/v3 is very unreliable and will break things.


Instead, if you do need split-brain protection   but you can not use
open()/fcntl() on a reclock file due to cluster filesystem semantincs
you can either run it without a reclockfile, which opens the
possibility of scplit brain  so it is probably sub-optimal.

It should be reasonably easy to replace the recovery-lock with a
different mechanism  using some other type of shared resource as
arbitrator.

Most of what you need would be to replace ctdb_recovery_lock() with an
alternative function that uses something else.
Perhaps have a shared dedicated scsi device and use persistent
reservations?  that would be useful.


(Just dont use NFS,   nfs file locking is broken by design  so this
will cause more problems than it is worth.)


>
> One more point I wanted to inquire about: if smbd daemons dies for some
> reason (abnormal exit - panic, etc.) what happens to CIFS locks it was
> holding? Are those locks automatically cleaned up?
>
> Thanks, Sergey
>


More information about the samba-technical mailing list