SV: ctdb relock file issues with glusterfs

patrick medina pgmedinajr at gmail.com
Wed Oct 17 16:12:29 MDT 2012


I ran into this issue as well, the gluster folks said this was a bug and
will be corrected in the next release (3.3.1 I think..)  The only work
around I found is to stop ctdb on the node you're going to reboot first,
then issue the reboot command.

PG

On Wed, Oct 17, 2012 at 3:42 PM, Morten Bøhmer <Morten.Bohmer at pilaro.no>wrote:

> I have done some testing, when I reboot one of my nodes, the other takes
> over. But when the one comes back online and ctdb begins doing it's magic
> to bring the node back up, both nodes goes bad for a short moment.
>
> Is this be design ?
>
> It does not seem like the share gets disconnected in this time period.
>
>
> Morten
> ________________________________________
> From: Christopher R. Hertel [crh at samba.org]
> Sent: Wednesday, October 17, 2012 11:27 PM
> To: patrick medina
> Cc: Morten Bøhmer; samba-technical at lists.samba.org
> Subject: Re: SV: ctdb relock file issues with glusterfs
>
> I am still doing some testing on this, so I appreciate your feedback and
> comments.  It just happened that I was working on the same problem at the
> same time.  :)
>
> Chris -)-----
>
> On 10/17/2012 04:01 PM, patrick medina wrote:
> > Yes, a very big thank you to Christopher and everyone else, that mount
> > command did the trick for me as well.
> >
> > At first this didn't work, but I moved the location of the lock file to
> the
> > root of the gluster share.  (/mnt/gluster/ctdb/lock  to
> /mnt/gluster/lock)
> >   Now we're all healthy and happy!
> >
> > Morten, was your lock file in the same location or did you have to move
> it
> > as well?
> >
> > Regards,
> > PG
> >
> > On Mon, Oct 15, 2012 at 4:07 PM, Morten Bøhmer <Morten.Bohmer at pilaro.no
> >wrote:
> >
> >> THANK YOU!!!
> >>
> >>
> >> This did it for me :)
> >>
> >> Ping_pong is now showing correct results
> >>
> >>
> >> Morten
> >>
> >> -----Opprinnelig melding-----
> >> Fra: samba-technical-bounces at lists.samba.org [mailto:
> >> samba-technical-bounces at lists.samba.org] På vegne av Christopher R.
> Hertel
> >> Sendt: 15. oktober 2012 20:50
> >> Til: samba-technical at lists.samba.org
> >> Emne: Re: SV: ctdb relock file issues with glusterfs
> >>
> >> Morten, Patrick:
> >>
> >> Please try your tests with the following option on mount:
> >>     --direct-io-mode=enable
> >>
> >> Let us know whether that changes your test results.
> >>
> >> Thanks.
> >>
> >> Chris -)-----
> >>
> >> On 10/15/2012 10:59 AM, Morten Bøhmer wrote:
> >>> Thank you.
> >>>
> >>> For the heck of it I installed a couple of Centos virtual servers and
> >> configure ctdb+glusterfs+xfs+samba, got it working, but without relock.
> >>>
> >>> Not sure how important it is, but I guess time will show :)
> >>>
> >>>
> >>> Morten
> >>>
> >>> Fra: patrick medina [mailto:pgmedinajr at gmail.com]
> >>> Sendt: 15. oktober 2012 17:57
> >>> Til: Morten Bøhmer
> >>> Kopi: Michael Adam; samba-technical at lists.samba.org
> >>> Emne: Re: ctdb relock file issues with glusterfs
> >>>
> >>> Morning Morten,
> >>>
> >>> I have been out of the office since Thursday, but am back today and
> >> ready to knock this out.  I'll keep you posted on what i find later this
> >> afternoon.
> >>>
> >>> Cheers
> >>>
> >>> On Fri, Oct 12, 2012 at 7:34 AM, Morten Bøhmer <
> Morten.Bohmer at pilaro.no
> >> <mailto:Morten.Bohmer at pilaro.no>> wrote:
> >>> Hi Patrick
> >>>
> >>> Any luck with your setup yet ?
> >>>
> >>>
> >>> I am now seriously looking into trying some other clusterfs to make
> ctdb
> >> work.
> >>>
> >>>
> >>> Morten
> >>>
> >>>
> >>> Fra: patrick medina
> >>> [mailto:pgmedinajr at gmail.com<mailto:pgmedinajr at gmail.com>]
> >>> Sendt: 10. oktober 2012 17:56
> >>> Til: Michael Adam
> >>> Kopi: Morten Bøhmer;
> >>> samba-technical at lists.samba.org<mailto:samba-technical at lists.samba.org
> >>>>
> >>> Emne: Re: ctdb relock file issues with glusterfs
> >>>
> >>> Thanks Michael,
> >>>
> >>> The way you explained ping_pong (going from "1"
> >>> to "2") isn't explain as well on the wiki so i'll test and most likely
> >> verify it will not increment.
> >>>
> >>> Cheers - Gil
> >>> On Wed, Oct 10, 2012 at 4:03 AM, Michael Adam <obnox at samba.org<mailto:
> >> obnox at samba.org>> wrote:
> >>> Hi folks,
> >>>
> >>> as indicated elsewhere already, before even trying to start and debug
> >>> ctdb, you should make sure that your cluster setup provides correct
> >>> posix fcntl byte range locks, by using the ping_pong tool shipped with
> >>> the ctdb package:
> >>>
> >>> https://wiki.samba.org/index.php/Ping_pong
> >>>
> >>> It is important to verify that the locks really reach "the other
> >>> node", i.e. there is real lock contention.
> >>>
> >>> This can in particular be tested with the -rw option to
> >>> ping_pong: If you run "ping_pong -rw /path/to/file 3" on one node and
> >>> then "ping_pong -rw /path/to/file 3" on a second node, you should see
> >>> the "data increment" notice (going from "1"
> >>> to "2"), indicating that you now have two processes operating on the
> >>> same file. If this stays constant (at 1) then your gluster setup does
> >>> not provide sufficient fcntl byte range lock support.
> >>>
> >>> Another way to verify this without "-rw" is using file that is one too
> >>> small:  run "ping_pong /path/to/file 2" on one node and then the same
> >>> command on a second node. These should block and not print positive
> >>> lock rates. If instead both happily print positive lock rates then
> >>> your locks don't reach the other node and you need to fix your
> >>> setup...
> >>>
> >>> Cheers - Michael
> >>>
> >>> On 2012-10-09 at 22:21 +0000, Morten Bøhmer wrote:
> >>>> Can confirm that I am experiencing the exact same issue.
> >>>>
> >>>> Would love to be able to solve this .....
> >>>>
> >>>>
> >>>> Morten
> >>>> ________________________________________
> >>>> From:
> >>>> samba-technical-bounces at lists.samba.org<mailto:samba-technical-bounce
> >>>> s at lists.samba.org>
> >>>> [samba-technical-bounces at lists.samba.org<mailto:samba-technical-bounc
> >>>> es at lists.samba.org>] on behalf of patrick medina
> >>>> [pgmedinajr at gmail.com<mailto:pgmedinajr at gmail.com>]
> >>>> Sent: Wednesday, October 10, 2012 12:10 AM
> >>>> To:
> >>>> samba-technical at lists.samba.org<mailto:samba-technical at lists.samba.or
> >>>> g>
> >>>> Subject: Re: ctdb relock file issues with glusterfs
> >>>>
> >>>> Afternoon/Morning Samba folks,
> >>>>
> >>>> I finally made some progress this afternoon, let me explain what I
> >> found.
> >>>>
> >>>> 1.  When I created the lock file, I had set it to chmod 777
> >>>> (rwxrwxrwx) Thinking about permissions, I recreated the lock file with
> >> rw-r--r--.
> >>>>    After doing this I am now able to bring one node to healthy at a
> >>>> time, but the other node will stay unhealthy.  I am able to juggle
> >>>> healthy nodes by shutting the ctdb service down and the 2nd node will
> >> become healthy.
> >>>>
> >>>> Log file on the unhealthy nodes complain about the recovery lock file
> >>>> not
> >>>> locked:
> >>>>
> >>>> 2012/10/09 14:55:40.335328 [set_recmode:16493]: ctdb_recovery_lock:
> >>>> Got recovery lock on '/mnt/gluster/ctdb/lock'
> >>>> 2012/10/09 14:55:40.335448 [set_recmode:16493]: ERROR: recovery lock
> >>>> file /mnt/gluster/ctdb/lock not locked when recovering!
> >>>>
> >>>>
> >>>> 2.  I created new mount point on one of the nodes, so each node has a
> >>>> unique mount to gluster.  Depending on which node starts first, the
> >>>> unhealthy node complaints about the others recovery lock location.
> >>>> How can this be if each node has it's on config file to go off of?
> >>>>
> >>>> Node1:  CTDB_RECOVERY_LOCK="/mnt/fuse/ctdb/lock"
> >>>> ctdb_recovery_lock: Unable to open /mnt/gluster/ctdb/lock - (No such
> >>>> file or directory)
> >>>>
> >>>>
> >>>> Node2:  CTDB_RECOVERY_LOCK="/mnt/gluster/ctdb/lock"
> >>>> ctdb_recovery_lock: Unable to open /mnt/fuse/ctdb/lock - (No such
> >>>> file or
> >>>> directory)
> >>>>
> >>>> Thanks again, I'm not sure where to troubleshoot next.
> >>>>
> >>>> Regards,
> >>>> Gilbert
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Oct 9, 2012 at 5:20 AM, Martin Schwenke <martin at meltin.net
> >> <mailto:martin at meltin.net>> wrote:
> >>>>
> >>>>> On Tue, 9 Oct 2012 16:32:12 +1100, Amitay Isaacs
> >>>>> <amitay at gmail.com<mailto:amitay at gmail.com>>
> >>>>> wrote:
> >>>>>
> >>>>>> On Tue, Oct 9, 2012 at 1:55 PM, patrick medina
> >>>>>> <pgmedinajr at gmail.com<mailto:pgmedinajr at gmail.com>>
> >>>>> wrote:
> >>>>>>> Howdy samba folks,
> >>>>>>>
> >>>>>>> I've been running into a lot of issues lately with ctdb's re-lock
> >>>>>>> file
> >>>>> and
> >>>>>>> glusterfs as the shared storage.  When I started, I could get one
> >>>>>>> or
> >>>>> the
> >>>>>>> other node to become healthy, but at least one would complain it
> >>>>>>> could
> >>>>> not
> >>>>>>> lock the re-lock file.  Nowi'm at the point where neither node
> >>>>>>> will
> >>>>> become
> >>>>>>> healthy and stay in a recovery loop.  Just to be sure it was the
> >>>>> re-lock
> >>>>>>> file, I commented it out in the config and both nodes became
> healthy.
> >>>>>
> >>>>>> What version of CTDB are you using? Can you attach the log file
> >>>>>> where you notice CTDB is continuously going in recovery? It would
> >>>>>> be useful to get log files from all the nodes.
> >>>>>
> >>>>> Michael Adam and I took a look at this on the weekend.  Gilbert sent
> >>>>> me some logs and this was happening:
> >>>>>
> >>>>>     ctdb_recovery_lock: Got recovery lock on '/mnt/gluster/ctdb/lock'
> >>>>>
> >>>>> That seems to indicate that locking isn't working as expected...
> >>>>>
> >>>>> peace & happiness,
> >>>>> martin
> >>>
> >>>
> >>
> >> --
> >> "Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
> >> Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
> >> jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development,
> >> uninq.
> >> ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
> >> OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org
> >>
>
> --
> "Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
> Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
> jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development,
> uninq.
> ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
> OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org
>


More information about the samba-technical mailing list