SV: ctdb relock file issues with glusterfs

Mon Oct 15 09:59:41 MDT 2012

Thank you.

For the heck of it I installed a couple of Centos virtual servers and configure ctdb+glusterfs+xfs+samba, got it working, but without relock.

Not sure how important it is, but I guess time will show :)

Morten

Fra: patrick medina [mailto:pgmedinajr at gmail.com]
Sendt: 15. oktober 2012 17:57
Til: Morten Bøhmer
Kopi: Michael Adam; samba-technical at lists.samba.org
Emne: Re: ctdb relock file issues with glusterfs

Morning Morten,

I have been out of the office since Thursday, but am back today and ready to knock this out.  I'll keep you posted on what i find later this afternoon.

Cheers

On Fri, Oct 12, 2012 at 7:34 AM, Morten Bøhmer <Morten.Bohmer at pilaro.no<mailto:Morten.Bohmer at pilaro.no>> wrote:
Hi Patrick

Any luck with your setup yet ?

I am now seriously looking into trying some other clusterfs to make ctdb work.

Morten

Fra: patrick medina [mailto:pgmedinajr at gmail.com<mailto:pgmedinajr at gmail.com>]
Sendt: 10. oktober 2012 17:56
Til: Michael Adam
Kopi: Morten Bøhmer; samba-technical at lists.samba.org<mailto:samba-technical at lists.samba.org>
Emne: Re: ctdb relock file issues with glusterfs

Thanks Michael,

The way you explained ping_pong (going from "1"
to "2") isn't explain as well on the wiki so i'll test and most likely verify it will not increment.

Cheers - Gil
On Wed, Oct 10, 2012 at 4:03 AM, Michael Adam <obnox at samba.org<mailto:obnox at samba.org>> wrote:
Hi folks,

as indicated elsewhere already, before even trying to start and
debug ctdb, you should make sure that your cluster setup provides
correct posix fcntl byte range locks, by using the ping_pong
tool shipped with the ctdb package:

https://wiki.samba.org/index.php/Ping_pong

It is important to verify that the locks really reach "the other
node", i.e. there is real lock contention.

This can in particular be tested with the -rw option to
ping_pong: If you run "ping_pong -rw /path/to/file 3" on
one node and then "ping_pong -rw /path/to/file 3" on a second
node, you should see the "data increment" notice (going from "1"
to "2"), indicating that you now have two processes operating
on the same file. If this stays constant (at 1) then your gluster
setup does not provide sufficient fcntl byte range lock support.

Another way to verify this without "-rw" is using file that is
one too small:  run "ping_pong /path/to/file 2" on one node and
then the same command on a second node. These should block and
not print positive lock rates. If instead both happily print positive
lock rates then your locks don't reach the other node and you
need to fix your setup...

Cheers - Michael

On 2012-10-09 at 22:21 +0000, Morten Bøhmer wrote:
> Can confirm that I am experiencing the exact same issue.
>
> Would love to be able to solve this .....
>
>
> Morten
> ________________________________________
> From: samba-technical-bounces at lists.samba.org<mailto:samba-technical-bounces at lists.samba.org> [samba-technical-bounces at lists.samba.org<mailto:samba-technical-bounces at lists.samba.org>] on behalf of patrick medina [pgmedinajr at gmail.com<mailto:pgmedinajr at gmail.com>]
> Sent: Wednesday, October 10, 2012 12:10 AM
> To: samba-technical at lists.samba.org<mailto:samba-technical at lists.samba.org>
> Subject: Re: ctdb relock file issues with glusterfs
>
> Afternoon/Morning Samba folks,
>
> I finally made some progress this afternoon, let me explain what I found.
>
> 1.  When I created the lock file, I had set it to chmod 777 (rwxrwxrwx)
> Thinking about permissions, I recreated the lock file with rw-r--r--.
>  After doing this I am now able to bring one node to healthy at a time, but
> the other node will stay unhealthy.  I am able to juggle healthy nodes by
> shutting the ctdb service down and the 2nd node will become healthy.
>
> Log file on the unhealthy nodes complain about the recovery lock file not
> locked:
>
> 2012/10/09 14:55:40.335328 [set_recmode:16493]: ctdb_recovery_lock: Got
> recovery lock on '/mnt/gluster/ctdb/lock'
> 2012/10/09 14:55:40.335448 [set_recmode:16493]: ERROR: recovery lock file
> /mnt/gluster/ctdb/lock not locked when recovering!
>
>
> 2.  I created new mount point on one of the nodes, so each node has a
> unique mount to gluster.  Depending on which node starts first, the
> unhealthy node complaints about the others recovery lock location.  How can
> this be if each node has it's on config file to go off of?
>
> Node1:  CTDB_RECOVERY_LOCK="/mnt/fuse/ctdb/lock"
> ctdb_recovery_lock: Unable to open /mnt/gluster/ctdb/lock - (No such file
> or directory)
>
>
> Node2:  CTDB_RECOVERY_LOCK="/mnt/gluster/ctdb/lock"
> ctdb_recovery_lock: Unable to open /mnt/fuse/ctdb/lock - (No such file or
> directory)
>
> Thanks again, I'm not sure where to troubleshoot next.
>
> Regards,
> Gilbert
>
>
>
> On Tue, Oct 9, 2012 at 5:20 AM, Martin Schwenke <martin at meltin.net<mailto:martin at meltin.net>> wrote:
>
> > On Tue, 9 Oct 2012 16:32:12 +1100, Amitay Isaacs <amitay at gmail.com<mailto:amitay at gmail.com>>
> > wrote:
> >
> > > On Tue, Oct 9, 2012 at 1:55 PM, patrick medina <pgmedinajr at gmail.com<mailto:pgmedinajr at gmail.com>>
> > wrote:
> > > > Howdy samba folks,
> > > >
> > > > I've been running into a lot of issues lately with ctdb's re-lock file
> > and
> > > > glusterfs as the shared storage.  When I started, I could get one or
> > the
> > > > other node to become healthy, but at least one would complain it could
> > not
> > > > lock the re-lock file.  Nowi'm at the point where neither node will
> > become
> > > > healthy and stay in a recovery loop.  Just to be sure it was the
> > re-lock
> > > > file, I commented it out in the config and both nodes became healthy.
> >
> > > What version of CTDB are you using? Can you attach the log file where
> > > you notice CTDB is continuously going in recovery? It would be useful
> > > to get log files from all the nodes.
> >
> > Michael Adam and I took a look at this on the weekend.  Gilbert sent me
> > some logs and this was happening:
> >
> >   ctdb_recovery_lock: Got recovery lock on '/mnt/gluster/ctdb/lock'
> >
> > That seems to indicate that locking isn't working as expected...
> >
> > peace & happiness,
> > martin