[ceph-users] CTDB Cluster Samba on Cephfs

Thu Mar 28 19:22:01 MDT 2013

On Thu, Mar 28, 2013 at 6:09 PM, Sage Weil <sage at inktank.com> wrote:
> On Thu, 28 Mar 2013, ronnie sahlberg wrote:
>> Disable the recovery lock file from ctdb completely.
>> And disable fcntl locking from samba.
>>
>> To be blunt, unless your cluster filesystem is called GPFS,
>> locking is probably completely broken and should be avoided.
>
> Ha!
>
>> On Thu, Mar 28, 2013 at 8:46 AM, Marco Aroldi <marco.aroldi at gmail.com> wrote:
>> > Thanks for the answer,
>> >
>> > I haven't yet looked at the samba.git clone, sorry. I will.
>> >
>> > Just a quick report on my test environment:
>> > * cephfs mounted with kernel driver re-exported from 2 samba nodes
>> > * If "node B" goes down, everything works like a charm: "node A" does
>> > ip takeover and bring up the "node B"'s ip
>> > * Instead, if "node A" goes down, "node B" can't take the rlock file
>> > and gives this error:
>> >
>> > ctdb_recovery_lock: Failed to get recovery lock on
>> > '/mnt/ceph/samba-cluster/rlock'
>> > Unable to get recovery lock - aborting recovery and ban ourself for 300 seconds
>> >
>> > * So, for 5 minutes, neither "node A" nor "node B" are active. After
>> > that, the cluster recover correctly.
>> > It seems that one of the 2 nodes "owns" and don't want to "release"
>> > the rlock file
>
> Cephfs aims to give you coherent access between nodes.  The cost of that
> is that if another client goes down and it holds some lease/lock, you have
> to wait for it to time out.  That is supposed to happen after 60 seconds,
> it sounds like you've hit a bug here.  The flock/fnctl locks aren't
> super-well tested in the failure scenarios.
>
> Even assuming it were working, though, I'm not sure that you want to wait
> the 60 seconds either for the CTDB's to take over for each other.

You do not want to wait 60 seconds. That is approaching territory where
CIFS clients will start causing file corruption and dataloss due to
them dropping writeback caches.

You probably want to aim to try to guarantee that fcntl() locking
start working again after
~20 seconds or so to have some headroom.

Microsoft themself state 25seconds as the absolute deadline they
require you guarantee before they will qualify storage.
That is among other things to accomodate and have some headroom for
some really nasty dataloss issues that will
happen if storage can not recover quickly enough.

CIFS is hard realtime. And you will pay dearly for missing the deadline.

regards
ronnie sahlberg