SV: ctdb relock file issues with glusterfs

Wed Oct 17 15:27:53 MDT 2012

I am still doing some testing on this, so I appreciate your feedback and 
comments.  It just happened that I was working on the same problem at the 
same time.  :)

Chris -)-----

On 10/17/2012 04:01 PM, patrick medina wrote:
> Yes, a very big thank you to Christopher and everyone else, that mount
> command did the trick for me as well.
>
> At first this didn't work, but I moved the location of the lock file to the
> root of the gluster share.  (/mnt/gluster/ctdb/lock  to /mnt/gluster/lock)
>   Now we're all healthy and happy!
>
> Morten, was your lock file in the same location or did you have to move it
> as well?
>
> Regards,
> PG
>
> On Mon, Oct 15, 2012 at 4:07 PM, Morten Bøhmer <Morten.Bohmer at pilaro.no>wrote:
>
>> THANK YOU!!!
>>
>>
>> This did it for me :)
>>
>> Ping_pong is now showing correct results
>>
>>
>> Morten
>>
>> -----Opprinnelig melding-----
>> Fra: samba-technical-bounces at lists.samba.org [mailto:
>> samba-technical-bounces at lists.samba.org] På vegne av Christopher R. Hertel
>> Sendt: 15. oktober 2012 20:50
>> Til: samba-technical at lists.samba.org
>> Emne: Re: SV: ctdb relock file issues with glusterfs
>>
>> Morten, Patrick:
>>
>> Please try your tests with the following option on mount:
>>     --direct-io-mode=enable
>>
>> Let us know whether that changes your test results.
>>
>> Thanks.
>>
>> Chris -)-----
>>
>> On 10/15/2012 10:59 AM, Morten Bøhmer wrote:
>>> Thank you.
>>>
>>> For the heck of it I installed a couple of Centos virtual servers and
>> configure ctdb+glusterfs+xfs+samba, got it working, but without relock.
>>>
>>> Not sure how important it is, but I guess time will show :)
>>>
>>>
>>> Morten
>>>
>>> Fra: patrick medina [mailto:pgmedinajr at gmail.com]
>>> Sendt: 15. oktober 2012 17:57
>>> Til: Morten Bøhmer
>>> Kopi: Michael Adam; samba-technical at lists.samba.org
>>> Emne: Re: ctdb relock file issues with glusterfs
>>>
>>> Morning Morten,
>>>
>>> I have been out of the office since Thursday, but am back today and
>> ready to knock this out.  I'll keep you posted on what i find later this
>> afternoon.
>>>
>>> Cheers
>>>
>>> On Fri, Oct 12, 2012 at 7:34 AM, Morten Bøhmer <Morten.Bohmer at pilaro.no
>> <mailto:Morten.Bohmer at pilaro.no>> wrote:
>>> Hi Patrick
>>>
>>> Any luck with your setup yet ?
>>>
>>>
>>> I am now seriously looking into trying some other clusterfs to make ctdb
>> work.
>>>
>>>
>>> Morten
>>>
>>>
>>> Fra: patrick medina
>>> [mailto:pgmedinajr at gmail.com<mailto:pgmedinajr at gmail.com>]
>>> Sendt: 10. oktober 2012 17:56
>>> Til: Michael Adam
>>> Kopi: Morten Bøhmer;
>>> samba-technical at lists.samba.org<mailto:samba-technical at lists.samba.org
>>>>
>>> Emne: Re: ctdb relock file issues with glusterfs
>>>
>>> Thanks Michael,
>>>
>>> The way you explained ping_pong (going from "1"
>>> to "2") isn't explain as well on the wiki so i'll test and most likely
>> verify it will not increment.
>>>
>>> Cheers - Gil
>>> On Wed, Oct 10, 2012 at 4:03 AM, Michael Adam <obnox at samba.org<mailto:
>> obnox at samba.org>> wrote:
>>> Hi folks,
>>>
>>> as indicated elsewhere already, before even trying to start and debug
>>> ctdb, you should make sure that your cluster setup provides correct
>>> posix fcntl byte range locks, by using the ping_pong tool shipped with
>>> the ctdb package:
>>>
>>> https://wiki.samba.org/index.php/Ping_pong
>>>
>>> It is important to verify that the locks really reach "the other
>>> node", i.e. there is real lock contention.
>>>
>>> This can in particular be tested with the -rw option to
>>> ping_pong: If you run "ping_pong -rw /path/to/file 3" on one node and
>>> then "ping_pong -rw /path/to/file 3" on a second node, you should see
>>> the "data increment" notice (going from "1"
>>> to "2"), indicating that you now have two processes operating on the
>>> same file. If this stays constant (at 1) then your gluster setup does
>>> not provide sufficient fcntl byte range lock support.
>>>
>>> Another way to verify this without "-rw" is using file that is one too
>>> small:  run "ping_pong /path/to/file 2" on one node and then the same
>>> command on a second node. These should block and not print positive
>>> lock rates. If instead both happily print positive lock rates then
>>> your locks don't reach the other node and you need to fix your
>>> setup...
>>>
>>> Cheers - Michael
>>>
>>> On 2012-10-09 at 22:21 +0000, Morten Bøhmer wrote:
>>>> Can confirm that I am experiencing the exact same issue.
>>>>
>>>> Would love to be able to solve this .....
>>>>
>>>>
>>>> Morten
>>>> ________________________________________
>>>> From:
>>>> samba-technical-bounces at lists.samba.org<mailto:samba-technical-bounce
>>>> s at lists.samba.org>
>>>> [samba-technical-bounces at lists.samba.org<mailto:samba-technical-bounc
>>>> es at lists.samba.org>] on behalf of patrick medina
>>>> [pgmedinajr at gmail.com<mailto:pgmedinajr at gmail.com>]
>>>> Sent: Wednesday, October 10, 2012 12:10 AM
>>>> To:
>>>> samba-technical at lists.samba.org<mailto:samba-technical at lists.samba.or
>>>> g>
>>>> Subject: Re: ctdb relock file issues with glusterfs
>>>>
>>>> Afternoon/Morning Samba folks,
>>>>
>>>> I finally made some progress this afternoon, let me explain what I
>> found.
>>>>
>>>> 1.  When I created the lock file, I had set it to chmod 777
>>>> (rwxrwxrwx) Thinking about permissions, I recreated the lock file with
>> rw-r--r--.
>>>>    After doing this I am now able to bring one node to healthy at a
>>>> time, but the other node will stay unhealthy.  I am able to juggle
>>>> healthy nodes by shutting the ctdb service down and the 2nd node will
>> become healthy.
>>>>
>>>> Log file on the unhealthy nodes complain about the recovery lock file
>>>> not
>>>> locked:
>>>>
>>>> 2012/10/09 14:55:40.335328 [set_recmode:16493]: ctdb_recovery_lock:
>>>> Got recovery lock on '/mnt/gluster/ctdb/lock'
>>>> 2012/10/09 14:55:40.335448 [set_recmode:16493]: ERROR: recovery lock
>>>> file /mnt/gluster/ctdb/lock not locked when recovering!
>>>>
>>>>
>>>> 2.  I created new mount point on one of the nodes, so each node has a
>>>> unique mount to gluster.  Depending on which node starts first, the
>>>> unhealthy node complaints about the others recovery lock location.
>>>> How can this be if each node has it's on config file to go off of?
>>>>
>>>> Node1:  CTDB_RECOVERY_LOCK="/mnt/fuse/ctdb/lock"
>>>> ctdb_recovery_lock: Unable to open /mnt/gluster/ctdb/lock - (No such
>>>> file or directory)
>>>>
>>>>
>>>> Node2:  CTDB_RECOVERY_LOCK="/mnt/gluster/ctdb/lock"
>>>> ctdb_recovery_lock: Unable to open /mnt/fuse/ctdb/lock - (No such
>>>> file or
>>>> directory)
>>>>
>>>> Thanks again, I'm not sure where to troubleshoot next.
>>>>
>>>> Regards,
>>>> Gilbert
>>>>
>>>>
>>>>
>>>> On Tue, Oct 9, 2012 at 5:20 AM, Martin Schwenke <martin at meltin.net
>> <mailto:martin at meltin.net>> wrote:
>>>>
>>>>> On Tue, 9 Oct 2012 16:32:12 +1100, Amitay Isaacs
>>>>> <amitay at gmail.com<mailto:amitay at gmail.com>>
>>>>> wrote:
>>>>>
>>>>>> On Tue, Oct 9, 2012 at 1:55 PM, patrick medina
>>>>>> <pgmedinajr at gmail.com<mailto:pgmedinajr at gmail.com>>
>>>>> wrote:
>>>>>>> Howdy samba folks,
>>>>>>>
>>>>>>> I've been running into a lot of issues lately with ctdb's re-lock
>>>>>>> file
>>>>> and
>>>>>>> glusterfs as the shared storage.  When I started, I could get one
>>>>>>> or
>>>>> the
>>>>>>> other node to become healthy, but at least one would complain it
>>>>>>> could
>>>>> not
>>>>>>> lock the re-lock file.  Nowi'm at the point where neither node
>>>>>>> will
>>>>> become
>>>>>>> healthy and stay in a recovery loop.  Just to be sure it was the
>>>>> re-lock
>>>>>>> file, I commented it out in the config and both nodes became healthy.
>>>>>
>>>>>> What version of CTDB are you using? Can you attach the log file
>>>>>> where you notice CTDB is continuously going in recovery? It would
>>>>>> be useful to get log files from all the nodes.
>>>>>
>>>>> Michael Adam and I took a look at this on the weekend.  Gilbert sent
>>>>> me some logs and this was happening:
>>>>>
>>>>>     ctdb_recovery_lock: Got recovery lock on '/mnt/gluster/ctdb/lock'
>>>>>
>>>>> That seems to indicate that locking isn't working as expected...
>>>>>
>>>>> peace & happiness,
>>>>> martin
>>>
>>>
>>
>> --
>> "Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
>> Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
>> jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development,
>> uninq.
>> ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
>> OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org
>>

-- 
"Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org