[ceph-users] CTDB Cluster Samba on Cephfs

Fri Mar 29 10:31:35 MDT 2013

Still trying with no success:

Sage and Ronnie:
I've tried the ping_pong tool, even with "locking=no" in my smb.conf
(no differences)

# ping_pong /mnt/ceph/samba-cluster/test 3
I have about 180 locks/second
If I start the same command from the other node, the tools stops
completely. 0 locks/second

Sage, when I start the CTDB service, the mds log says every second:
2013-03-29 16:49:34.442437 7f33fe6f3700  0 mds.0.server
handle_client_file_setlock: start: 0, length: 0, client: 5475, pid:
14795, type: 4

2013-03-29 16:49:35.440856 7f33fe6f3700  0 mds.0.server
handle_client_file_setlock: start: 0, length: 0, client: 5475, pid:
14799, type: 4

Exactly as you see it: with a blank line in between
When i start the ping_pong command i have these lines at the same rate
reported by the script (180 lines/second):

2013-03-29 17:07:50.277003 7f33fe6f3700  0 mds.0.server
handle_client_file_setlock: start: 2, length: 1, client: 5481, pid:
11011, type: 2

2013-03-29 17:07:50.281279 7f33fe6f3700  0 mds.0.server
handle_client_file_setlock: start: 1, length: 1, client: 5481, pid:
11011, type: 4

2013-03-29 17:07:50.286643 7f33fe6f3700  0 mds.0.server
handle_client_file_setlock: start: 0, length: 1, client: 5481, pid:
11011, type: 2

Finally, I've tried to lower the ctdb's RecoverBanPeriod but the
clients was unable to recover for 5 minutes (again!)
So, I've found the mds logging this:
2013-03-29 16:55:23.354854 7f33fc4ed700  0 log [INF] : closing stale
session client.5475 192.168.130.11:0/580042840 after 300.159862

I hope to find a solution.
I am at your disposal to further investigation

--
Marco Aroldi

2013/3/29 ronnie sahlberg <ronniesahlberg at gmail.com>:
> The ctdb package comes with a tool "ping pong" that is used to test
> and exercise fcntl() locking.
>
> I think a good test is using this tool and then randomly powercycling
> nodes in your fs cluster
> making sure that
> 1, fcntl() locking is still coherent and correct
> 2, always recover within 20 seconds for a single node power cycle
>
>
> That is probably a good test for CIFS serving.
>
>
> On Thu, Mar 28, 2013 at 6:22 PM, ronnie sahlberg
> <ronniesahlberg at gmail.com> wrote:
>> On Thu, Mar 28, 2013 at 6:09 PM, Sage Weil <sage at inktank.com> wrote:
>>> On Thu, 28 Mar 2013, ronnie sahlberg wrote:
>>>> Disable the recovery lock file from ctdb completely.
>>>> And disable fcntl locking from samba.
>>>>
>>>> To be blunt, unless your cluster filesystem is called GPFS,
>>>> locking is probably completely broken and should be avoided.
>>>
>>> Ha!
>>>
>>>> On Thu, Mar 28, 2013 at 8:46 AM, Marco Aroldi <marco.aroldi at gmail.com> wrote:
>>>> > Thanks for the answer,
>>>> >
>>>> > I haven't yet looked at the samba.git clone, sorry. I will.
>>>> >
>>>> > Just a quick report on my test environment:
>>>> > * cephfs mounted with kernel driver re-exported from 2 samba nodes
>>>> > * If "node B" goes down, everything works like a charm: "node A" does
>>>> > ip takeover and bring up the "node B"'s ip
>>>> > * Instead, if "node A" goes down, "node B" can't take the rlock file
>>>> > and gives this error:
>>>> >
>>>> > ctdb_recovery_lock: Failed to get recovery lock on
>>>> > '/mnt/ceph/samba-cluster/rlock'
>>>> > Unable to get recovery lock - aborting recovery and ban ourself for 300 seconds
>>>> >
>>>> > * So, for 5 minutes, neither "node A" nor "node B" are active. After
>>>> > that, the cluster recover correctly.
>>>> > It seems that one of the 2 nodes "owns" and don't want to "release"
>>>> > the rlock file
>>>
>>> Cephfs aims to give you coherent access between nodes.  The cost of that
>>> is that if another client goes down and it holds some lease/lock, you have
>>> to wait for it to time out.  That is supposed to happen after 60 seconds,
>>> it sounds like you've hit a bug here.  The flock/fnctl locks aren't
>>> super-well tested in the failure scenarios.
>>>
>>> Even assuming it were working, though, I'm not sure that you want to wait
>>> the 60 seconds either for the CTDB's to take over for each other.
>>
>> You do not want to wait 60 seconds. That is approaching territory where
>> CIFS clients will start causing file corruption and dataloss due to
>> them dropping writeback caches.
>>
>> You probably want to aim to try to guarantee that fcntl() locking
>> start working again after
>> ~20 seconds or so to have some headroom.
>>
>>
>> Microsoft themself state 25seconds as the absolute deadline they
>> require you guarantee before they will qualify storage.
>> That is among other things to accomodate and have some headroom for
>> some really nasty dataloss issues that will
>> happen if storage can not recover quickly enough.
>>
>>
>> CIFS is hard realtime. And you will pay dearly for missing the deadline.
>>
>>
>> regards
>> ronnie sahlberg