SV: ctdb relock file issues with glusterfs

Wed Oct 17 15:42:34 MDT 2012

I have done some testing, when I reboot one of my nodes, the other takes over. But when the one comes back online and ctdb begins doing it's magic to bring the node back up, both nodes goes bad for a short moment.

Is this be design ?

It does not seem like the share gets disconnected in this time period.

Morten
________________________________________
From: Christopher R. Hertel [crh at samba.org]
Sent: Wednesday, October 17, 2012 11:27 PM
To: patrick medina
Cc: Morten Bøhmer; samba-technical at lists.samba.org
Subject: Re: SV: ctdb relock file issues with glusterfs

I am still doing some testing on this, so I appreciate your feedback and
comments.  It just happened that I was working on the same problem at the
same time.  :)

Chris -)-----

On 10/17/2012 04:01 PM, patrick medina wrote:
> Yes, a very big thank you to Christopher and everyone else, that mount
> command did the trick for me as well.
>
> At first this didn't work, but I moved the location of the lock file to the
> root of the gluster share.  (/mnt/gluster/ctdb/lock  to /mnt/gluster/lock)
>   Now we're all healthy and happy!
>
> Morten, was your lock file in the same location or did you have to move it
> as well?
>
> Regards,
> PG
>
> On Mon, Oct 15, 2012 at 4:07 PM, Morten Bøhmer <Morten.Bohmer at pilaro.no>wrote:
>
>> THANK YOU!!!
>>
>>
>> This did it for me :)
>>
>> Ping_pong is now showing correct results
>>
>>
>> Morten
>>
>> -----Opprinnelig melding-----
>> Fra: samba-technical-bounces at lists.samba.org [mailto:
>> samba-technical-bounces at lists.samba.org] På vegne av Christopher R. Hertel
>> Sendt: 15. oktober 2012 20:50
>> Til: samba-technical at lists.samba.org
>> Emne: Re: SV: ctdb relock file issues with glusterfs
>>
>> Morten, Patrick:
>>
>> Please try your tests with the following option on mount:
>>     --direct-io-mode=enable
>>
>> Let us know whether that changes your test results.
>>
>> Thanks.
>>
>> Chris -)-----
>>
>> On 10/15/2012 10:59 AM, Morten Bøhmer wrote:
>>> Thank you.
>>>
>>> For the heck of it I installed a couple of Centos virtual servers and
>> configure ctdb+glusterfs+xfs+samba, got it working, but without relock.
>>>
>>> Not sure how important it is, but I guess time will show :)
>>>
>>>
>>> Morten
>>>
>>> Fra: patrick medina [mailto:pgmedinajr at gmail.com]
>>> Sendt: 15. oktober 2012 17:57
>>> Til: Morten Bøhmer
>>> Kopi: Michael Adam; samba-technical at lists.samba.org
>>> Emne: Re: ctdb relock file issues with glusterfs
>>>
>>> Morning Morten,
>>>
>>> I have been out of the office since Thursday, but am back today and
>> ready to knock this out.  I'll keep you posted on what i find later this
>> afternoon.
>>>
>>> Cheers
>>>
>>> On Fri, Oct 12, 2012 at 7:34 AM, Morten Bøhmer <Morten.Bohmer at pilaro.no
>> <mailto:Morten.Bohmer at pilaro.no>> wrote:
>>> Hi Patrick
>>>
>>> Any luck with your setup yet ?
>>>
>>>
>>> I am now seriously looking into trying some other clusterfs to make ctdb
>> work.
>>>
>>>
>>> Morten
>>>
>>>
>>> Fra: patrick medina
>>> [mailto:pgmedinajr at gmail.com<mailto:pgmedinajr at gmail.com>]
>>> Sendt: 10. oktober 2012 17:56
>>> Til: Michael Adam
>>> Kopi: Morten Bøhmer;
>>> samba-technical at lists.samba.org<mailto:samba-technical at lists.samba.org
>>>>
>>> Emne: Re: ctdb relock file issues with glusterfs
>>>
>>> Thanks Michael,
>>>
>>> The way you explained ping_pong (going from "1"
>>> to "2") isn't explain as well on the wiki so i'll test and most likely
>> verify it will not increment.
>>>
>>> Cheers - Gil
>>> On Wed, Oct 10, 2012 at 4:03 AM, Michael Adam <obnox at samba.org<mailto:
>> obnox at samba.org>> wrote:
>>> Hi folks,
>>>
>>> as indicated elsewhere already, before even trying to start and debug
>>> ctdb, you should make sure that your cluster setup provides correct
>>> posix fcntl byte range locks, by using the ping_pong tool shipped with
>>> the ctdb package:
>>>
>>> https://wiki.samba.org/index.php/Ping_pong
>>>
>>> It is important to verify that the locks really reach "the other
>>> node", i.e. there is real lock contention.
>>>
>>> This can in particular be tested with the -rw option to
>>> ping_pong: If you run "ping_pong -rw /path/to/file 3" on one node and
>>> then "ping_pong -rw /path/to/file 3" on a second node, you should see
>>> the "data increment" notice (going from "1"
>>> to "2"), indicating that you now have two processes operating on the
>>> same file. If this stays constant (at 1) then your gluster setup does
>>> not provide sufficient fcntl byte range lock support.
>>>
>>> Another way to verify this without "-rw" is using file that is one too
>>> small:  run "ping_pong /path/to/file 2" on one node and then the same
>>> command on a second node. These should block and not print positive
>>> lock rates. If instead both happily print positive lock rates then
>>> your locks don't reach the other node and you need to fix your
>>> setup...
>>>
>>> Cheers - Michael
>>>
>>> On 2012-10-09 at 22:21 +0000, Morten Bøhmer wrote:
>>>> Can confirm that I am experiencing the exact same issue.
>>>>
>>>> Would love to be able to solve this .....
>>>>
>>>>
>>>> Morten
>>>> ________________________________________
>>>> From:
>>>> samba-technical-bounces at lists.samba.org<mailto:samba-technical-bounce
>>>> s at lists.samba.org>
>>>> [samba-technical-bounces at lists.samba.org<mailto:samba-technical-bounc
>>>> es at lists.samba.org>] on behalf of patrick medina
>>>> [pgmedinajr at gmail.com<mailto:pgmedinajr at gmail.com>]
>>>> Sent: Wednesday, October 10, 2012 12:10 AM
>>>> To:
>>>> samba-technical at lists.samba.org<mailto:samba-technical at lists.samba.or
>>>> g>
>>>> Subject: Re: ctdb relock file issues with glusterfs
>>>>
>>>> Afternoon/Morning Samba folks,
>>>>
>>>> I finally made some progress this afternoon, let me explain what I
>> found.
>>>>
>>>> 1.  When I created the lock file, I had set it to chmod 777
>>>> (rwxrwxrwx) Thinking about permissions, I recreated the lock file with
>> rw-r--r--.
>>>>    After doing this I am now able to bring one node to healthy at a
>>>> time, but the other node will stay unhealthy.  I am able to juggle
>>>> healthy nodes by shutting the ctdb service down and the 2nd node will
>> become healthy.
>>>>
>>>> Log file on the unhealthy nodes complain about the recovery lock file
>>>> not
>>>> locked:
>>>>
>>>> 2012/10/09 14:55:40.335328 [set_recmode:16493]: ctdb_recovery_lock:
>>>> Got recovery lock on '/mnt/gluster/ctdb/lock'
>>>> 2012/10/09 14:55:40.335448 [set_recmode:16493]: ERROR: recovery lock
>>>> file /mnt/gluster/ctdb/lock not locked when recovering!
>>>>
>>>>
>>>> 2.  I created new mount point on one of the nodes, so each node has a
>>>> unique mount to gluster.  Depending on which node starts first, the
>>>> unhealthy node complaints about the others recovery lock location.
>>>> How can this be if each node has it's on config file to go off of?
>>>>
>>>> Node1:  CTDB_RECOVERY_LOCK="/mnt/fuse/ctdb/lock"
>>>> ctdb_recovery_lock: Unable to open /mnt/gluster/ctdb/lock - (No such
>>>> file or directory)
>>>>
>>>>
>>>> Node2:  CTDB_RECOVERY_LOCK="/mnt/gluster/ctdb/lock"
>>>> ctdb_recovery_lock: Unable to open /mnt/fuse/ctdb/lock - (No such
>>>> file or
>>>> directory)
>>>>
>>>> Thanks again, I'm not sure where to troubleshoot next.
>>>>
>>>> Regards,
>>>> Gilbert
>>>>
>>>>
>>>>
>>>> On Tue, Oct 9, 2012 at 5:20 AM, Martin Schwenke <martin at meltin.net
>> <mailto:martin at meltin.net>> wrote:
>>>>
>>>>> On Tue, 9 Oct 2012 16:32:12 +1100, Amitay Isaacs
>>>>> <amitay at gmail.com<mailto:amitay at gmail.com>>
>>>>> wrote:
>>>>>
>>>>>> On Tue, Oct 9, 2012 at 1:55 PM, patrick medina
>>>>>> <pgmedinajr at gmail.com<mailto:pgmedinajr at gmail.com>>
>>>>> wrote:
>>>>>>> Howdy samba folks,
>>>>>>>
>>>>>>> I've been running into a lot of issues lately with ctdb's re-lock
>>>>>>> file
>>>>> and
>>>>>>> glusterfs as the shared storage.  When I started, I could get one
>>>>>>> or
>>>>> the
>>>>>>> other node to become healthy, but at least one would complain it
>>>>>>> could
>>>>> not
>>>>>>> lock the re-lock file.  Nowi'm at the point where neither node
>>>>>>> will
>>>>> become
>>>>>>> healthy and stay in a recovery loop.  Just to be sure it was the
>>>>> re-lock
>>>>>>> file, I commented it out in the config and both nodes became healthy.
>>>>>
>>>>>> What version of CTDB are you using? Can you attach the log file
>>>>>> where you notice CTDB is continuously going in recovery? It would
>>>>>> be useful to get log files from all the nodes.
>>>>>
>>>>> Michael Adam and I took a look at this on the weekend.  Gilbert sent
>>>>> me some logs and this was happening:
>>>>>
>>>>>     ctdb_recovery_lock: Got recovery lock on '/mnt/gluster/ctdb/lock'
>>>>>
>>>>> That seems to indicate that locking isn't working as expected...
>>>>>
>>>>> peace & happiness,
>>>>> martin
>>>
>>>
>>
>> --
>> "Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
>> Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
>> jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development,
>> uninq.
>> ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
>> OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org
>>

--
"Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org