ctdb relock file issues with glusterfs

Wed Oct 10 09:55:52 MDT 2012

Thanks Michael,

The way you explained ping_pong (going from "1"
to "2") isn't explain as well on the wiki so i'll test and most likely
verify it will not increment.

Cheers - Gil

On Wed, Oct 10, 2012 at 4:03 AM, Michael Adam <obnox at samba.org> wrote:

> Hi folks,
>
> as indicated elsewhere already, before even trying to start and
> debug ctdb, you should make sure that your cluster setup provides
> correct posix fcntl byte range locks, by using the ping_pong
> tool shipped with the ctdb package:
>
> https://wiki.samba.org/index.php/Ping_pong
>
> It is important to verify that the locks really reach "the other
> node", i.e. there is real lock contention.
>
> This can in particular be tested with the -rw option to
> ping_pong: If you run "ping_pong -rw /path/to/file 3" on
> one node and then "ping_pong -rw /path/to/file 3" on a second
> node, you should see the "data increment" notice (going from "1"
> to "2"), indicating that you now have two processes operating
> on the same file. If this stays constant (at 1) then your gluster
> setup does not provide sufficient fcntl byte range lock support.
>
> Another way to verify this without "-rw" is using file that is
> one too small:  run "ping_pong /path/to/file 2" on one node and
> then the same command on a second node. These should block and
> not print positive lock rates. If instead both happily print positive
> lock rates then your locks don't reach the other node and you
> need to fix your setup...
>
> Cheers - Michael
>
> On 2012-10-09 at 22:21 +0000, Morten Bøhmer wrote:
> > Can confirm that I am experiencing the exact same issue.
> >
> > Would love to be able to solve this .....
> >
> >
> > Morten
> > ________________________________________
> > From: samba-technical-bounces at lists.samba.org [
> samba-technical-bounces at lists.samba.org] on behalf of patrick medina [
> pgmedinajr at gmail.com]
> > Sent: Wednesday, October 10, 2012 12:10 AM
> > To: samba-technical at lists.samba.org
> > Subject: Re: ctdb relock file issues with glusterfs
> >
> > Afternoon/Morning Samba folks,
> >
> > I finally made some progress this afternoon, let me explain what I found.
> >
> > 1.  When I created the lock file, I had set it to chmod 777 (rwxrwxrwx)
> > Thinking about permissions, I recreated the lock file with rw-r--r--.
> >  After doing this I am now able to bring one node to healthy at a time,
> but
> > the other node will stay unhealthy.  I am able to juggle healthy nodes by
> > shutting the ctdb service down and the 2nd node will become healthy.
> >
> > Log file on the unhealthy nodes complain about the recovery lock file not
> > locked:
> >
> > 2012/10/09 14:55:40.335328 [set_recmode:16493]: ctdb_recovery_lock: Got
> > recovery lock on '/mnt/gluster/ctdb/lock'
> > 2012/10/09 14:55:40.335448 [set_recmode:16493]: ERROR: recovery lock file
> > /mnt/gluster/ctdb/lock not locked when recovering!
> >
> >
> > 2.  I created new mount point on one of the nodes, so each node has a
> > unique mount to gluster.  Depending on which node starts first, the
> > unhealthy node complaints about the others recovery lock location.  How
> can
> > this be if each node has it's on config file to go off of?
> >
> > Node1:  CTDB_RECOVERY_LOCK="/mnt/fuse/ctdb/lock"
> > ctdb_recovery_lock: Unable to open /mnt/gluster/ctdb/lock - (No such file
> > or directory)
> >
> >
> > Node2:  CTDB_RECOVERY_LOCK="/mnt/gluster/ctdb/lock"
> > ctdb_recovery_lock: Unable to open /mnt/fuse/ctdb/lock - (No such file or
> > directory)
> >
> > Thanks again, I'm not sure where to troubleshoot next.
> >
> > Regards,
> > Gilbert
> >
> >
> >
> > On Tue, Oct 9, 2012 at 5:20 AM, Martin Schwenke <martin at meltin.net>
> wrote:
> >
> > > On Tue, 9 Oct 2012 16:32:12 +1100, Amitay Isaacs <amitay at gmail.com>
> > > wrote:
> > >
> > > > On Tue, Oct 9, 2012 at 1:55 PM, patrick medina <pgmedinajr at gmail.com
> >
> > > wrote:
> > > > > Howdy samba folks,
> > > > >
> > > > > I've been running into a lot of issues lately with ctdb's re-lock
> file
> > > and
> > > > > glusterfs as the shared storage.  When I started, I could get one
> or
> > > the
> > > > > other node to become healthy, but at least one would complain it
> could
> > > not
> > > > > lock the re-lock file.  Nowi'm at the point where neither node will
> > > become
> > > > > healthy and stay in a recovery loop.  Just to be sure it was the
> > > re-lock
> > > > > file, I commented it out in the config and both nodes became
> healthy.
> > >
> > > > What version of CTDB are you using? Can you attach the log file where
> > > > you notice CTDB is continuously going in recovery? It would be useful
> > > > to get log files from all the nodes.
> > >
> > > Michael Adam and I took a look at this on the weekend.  Gilbert sent me
> > > some logs and this was happening:
> > >
> > >   ctdb_recovery_lock: Got recovery lock on '/mnt/gluster/ctdb/lock'
> > >
> > > That seems to indicate that locking isn't working as expected...
> > >
> > > peace & happiness,
> > > martin
>