[ceph-users] CTDB Cluster Samba on Cephfs

Thu Mar 28 19:09:29 MDT 2013

On Thu, 28 Mar 2013, ronnie sahlberg wrote:
> Disable the recovery lock file from ctdb completely.
> And disable fcntl locking from samba.
> 
> To be blunt, unless your cluster filesystem is called GPFS,
> locking is probably completely broken and should be avoided.

Ha!

> On Thu, Mar 28, 2013 at 8:46 AM, Marco Aroldi <marco.aroldi at gmail.com> wrote:
> > Thanks for the answer,
> >
> > I haven't yet looked at the samba.git clone, sorry. I will.
> >
> > Just a quick report on my test environment:
> > * cephfs mounted with kernel driver re-exported from 2 samba nodes
> > * If "node B" goes down, everything works like a charm: "node A" does
> > ip takeover and bring up the "node B"'s ip
> > * Instead, if "node A" goes down, "node B" can't take the rlock file
> > and gives this error:
> >
> > ctdb_recovery_lock: Failed to get recovery lock on
> > '/mnt/ceph/samba-cluster/rlock'
> > Unable to get recovery lock - aborting recovery and ban ourself for 300 seconds
> >
> > * So, for 5 minutes, neither "node A" nor "node B" are active. After
> > that, the cluster recover correctly.
> > It seems that one of the 2 nodes "owns" and don't want to "release"
> > the rlock file

Cephfs aims to give you coherent access between nodes.  The cost of that 
is that if another client goes down and it holds some lease/lock, you have 
to wait for it to time out.  That is supposed to happen after 60 seconds, 
it sounds like you've hit a bug here.  The flock/fnctl locks aren't 
super-well tested in the failure scenarios.

Even assuming it were working, though, I'm not sure that you want to wait 
the 60 seconds either for the CTDB's to take over for each other.  (I 
wonder what the fencing timeout is for GPFS by default?)  It may be that 
giving CTDB a backend using RADOS or some other shared storage that is 
more flexible with respect to timeouts than POSIX would be more 
appropriate.  In the past we've had conversations with people who wanted 
to do this, but they got pulled off onto something else.

sage

> >
> > I'm in a sandstill
> > Any hint is appreciated
> >
> > --
> > Marco Aroldi
> >
> >
> > 2013/3/28 Sage Weil <sage at inktank.com>:
> >> On Wed, 27 Mar 2013, Matthieu Patou wrote:
> >>> On 03/27/2013 10:41 AM, Marco Aroldi wrote:
> >>> > Hi list,
> >>> > I'm trying to create a active/active Samba cluster on top of Cephfs
> >>> > I would ask if Ceph fully supports CTDB at this time.
> >>>
> >>> If I'm not wrong Ceph (even CephFS) do not support exporting a block device or
> >>> mounting the same FS more than once whereas CTDB explicitly require that you
> >>> have a distributed filesystem where the same filesystem is mounted across all
> >>> the nodes.
> >>
> >>
> >> As for CTDB: we haven't looked at this specifically in the context for
> >> Samba.  Generally speaking, anything you can do with NFS or another shared
> >> file system you can do with CephFS.  IIRC last time I discussed this with
> >> the Samba guys, there is more we could do here to make CTDB much more
> >> efficient (by backing it with RADOS, for example), but we haven't looked
> >> at any of this carefully.  We would love to see clustered Samba working
> >> well on CephFS, though!  If you haven't already, plese look at our
> >> samba.git cloen on github, which has patches gluing libcephfs directly
> >> into Samba's VFS, allowing you to directly reexport CephFS via Samba
> >> without a local mountpoint in the middle.
> >>
> >> sage
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users at lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
>