[ceph-users] CTDB Cluster Samba on Cephfs

Thu Mar 28 19:27:30 MDT 2013

The ctdb package comes with a tool "ping pong" that is used to test
and exercise fcntl() locking.

I think a good test is using this tool and then randomly powercycling
nodes in your fs cluster
making sure that
1, fcntl() locking is still coherent and correct
2, always recover within 20 seconds for a single node power cycle

That is probably a good test for CIFS serving.

On Thu, Mar 28, 2013 at 6:22 PM, ronnie sahlberg
<ronniesahlberg at gmail.com> wrote:
> On Thu, Mar 28, 2013 at 6:09 PM, Sage Weil <sage at inktank.com> wrote:
>> On Thu, 28 Mar 2013, ronnie sahlberg wrote:
>>> Disable the recovery lock file from ctdb completely.
>>> And disable fcntl locking from samba.
>>>
>>> To be blunt, unless your cluster filesystem is called GPFS,
>>> locking is probably completely broken and should be avoided.
>>
>> Ha!
>>
>>> On Thu, Mar 28, 2013 at 8:46 AM, Marco Aroldi <marco.aroldi at gmail.com> wrote:
>>> > Thanks for the answer,
>>> >
>>> > I haven't yet looked at the samba.git clone, sorry. I will.
>>> >
>>> > Just a quick report on my test environment:
>>> > * cephfs mounted with kernel driver re-exported from 2 samba nodes
>>> > * If "node B" goes down, everything works like a charm: "node A" does
>>> > ip takeover and bring up the "node B"'s ip
>>> > * Instead, if "node A" goes down, "node B" can't take the rlock file
>>> > and gives this error:
>>> >
>>> > ctdb_recovery_lock: Failed to get recovery lock on
>>> > '/mnt/ceph/samba-cluster/rlock'
>>> > Unable to get recovery lock - aborting recovery and ban ourself for 300 seconds
>>> >
>>> > * So, for 5 minutes, neither "node A" nor "node B" are active. After
>>> > that, the cluster recover correctly.
>>> > It seems that one of the 2 nodes "owns" and don't want to "release"
>>> > the rlock file
>>
>> Cephfs aims to give you coherent access between nodes.  The cost of that
>> is that if another client goes down and it holds some lease/lock, you have
>> to wait for it to time out.  That is supposed to happen after 60 seconds,
>> it sounds like you've hit a bug here.  The flock/fnctl locks aren't
>> super-well tested in the failure scenarios.
>>
>> Even assuming it were working, though, I'm not sure that you want to wait
>> the 60 seconds either for the CTDB's to take over for each other.
>
> You do not want to wait 60 seconds. That is approaching territory where
> CIFS clients will start causing file corruption and dataloss due to
> them dropping writeback caches.
>
> You probably want to aim to try to guarantee that fcntl() locking
> start working again after
> ~20 seconds or so to have some headroom.
>
>
> Microsoft themself state 25seconds as the absolute deadline they
> require you guarantee before they will qualify storage.
> That is among other things to accomodate and have some headroom for
> some really nasty dataloss issues that will
> happen if storage can not recover quickly enough.
>
>
> CIFS is hard realtime. And you will pay dearly for missing the deadline.
>
>
> regards
> ronnie sahlberg