ctdb relock file issues with glusterfs

ronnie sahlberg ronniesahlberg at gmail.com
Fri Oct 12 13:06:39 MDT 2012


Ctdb requires that you mount the filesystem in the same place on all the nodes.
During recovery, the node that is elected the recovery master node
will make sure that all nodes
are using the same file/path for the rec-lock file.

This is to prevent user mistakes from creating issues.
I.e. if you changed the CTDB_RECOVERY_LOCK setting on just some, but
not all, of the nodes.
This ensures that all nodes use the same setting.


The mechanism to ensure split-brain avoidance, ctdb uses fcntl()
locking on the rec-lock file as an arbitrator of which node is the
recover master.
Here is why correct fcntl() locking behaviour in the underlying
cluster filesystem becomes important.
I dont think your filesystem supports working/correct fcntl() locking,
so that is where these warning emssages come from.


You can still use ctdb on such fuilesystems, but you will have to
comment out the CTDB_RECOVERY_LOCK file from /etc/sysconfig/ctdb.
This disables the split-brain prevention completely, but allows you to
use filesystems with broken fcntl() locking.

Additionally, if you disable this because of broken fcntl() locking,
you should probably also
make sure to NOT export data via NFS since the lock manager in nfs
will not work reliable.

Since you can not use NFS at the same time as samba safely with that
filesystem, you can thus also change the samba config to "posix
locking = no". This will make samba do all file locking internally and
never propagate them to the kernel/filesystem and may provide some
performance boost.


regards
ronnie sahlberg

On Tue, Oct 9, 2012 at 3:10 PM, patrick medina <pgmedinajr at gmail.com> wrote:
> Afternoon/Morning Samba folks,
>
> I finally made some progress this afternoon, let me explain what I found.
>
> 1.  When I created the lock file, I had set it to chmod 777 (rwxrwxrwx)
> Thinking about permissions, I recreated the lock file with rw-r--r--.
>  After doing this I am now able to bring one node to healthy at a time, but
> the other node will stay unhealthy.  I am able to juggle healthy nodes by
> shutting the ctdb service down and the 2nd node will become healthy.
>
> Log file on the unhealthy nodes complain about the recovery lock file not
> locked:
>
> 2012/10/09 14:55:40.335328 [set_recmode:16493]: ctdb_recovery_lock: Got
> recovery lock on '/mnt/gluster/ctdb/lock'
> 2012/10/09 14:55:40.335448 [set_recmode:16493]: ERROR: recovery lock file
> /mnt/gluster/ctdb/lock not locked when recovering!
>
>
> 2.  I created new mount point on one of the nodes, so each node has a
> unique mount to gluster.  Depending on which node starts first, the
> unhealthy node complaints about the others recovery lock location.  How can
> this be if each node has it's on config file to go off of?
>
> Node1:  CTDB_RECOVERY_LOCK="/mnt/fuse/ctdb/lock"
> ctdb_recovery_lock: Unable to open /mnt/gluster/ctdb/lock - (No such file
> or directory)
>
>
> Node2:  CTDB_RECOVERY_LOCK="/mnt/gluster/ctdb/lock"
> ctdb_recovery_lock: Unable to open /mnt/fuse/ctdb/lock - (No such file or
> directory)
>
> Thanks again, I'm not sure where to troubleshoot next.
>
> Regards,
> Gilbert
>
>
>
> On Tue, Oct 9, 2012 at 5:20 AM, Martin Schwenke <martin at meltin.net> wrote:
>
>> On Tue, 9 Oct 2012 16:32:12 +1100, Amitay Isaacs <amitay at gmail.com>
>> wrote:
>>
>> > On Tue, Oct 9, 2012 at 1:55 PM, patrick medina <pgmedinajr at gmail.com>
>> wrote:
>> > > Howdy samba folks,
>> > >
>> > > I've been running into a lot of issues lately with ctdb's re-lock file
>> and
>> > > glusterfs as the shared storage.  When I started, I could get one or
>> the
>> > > other node to become healthy, but at least one would complain it could
>> not
>> > > lock the re-lock file.  Nowi'm at the point where neither node will
>> become
>> > > healthy and stay in a recovery loop.  Just to be sure it was the
>> re-lock
>> > > file, I commented it out in the config and both nodes became healthy.
>>
>> > What version of CTDB are you using? Can you attach the log file where
>> > you notice CTDB is continuously going in recovery? It would be useful
>> > to get log files from all the nodes.
>>
>> Michael Adam and I took a look at this on the weekend.  Gilbert sent me
>> some logs and this was happening:
>>
>>   ctdb_recovery_lock: Got recovery lock on '/mnt/gluster/ctdb/lock'
>>
>> That seems to indicate that locking isn't working as expected...
>>
>> peace & happiness,
>> martin
>>


More information about the samba-technical mailing list