CTDB with GlusterFS.

Wed Nov 11 06:12:31 UTC 2015

Hi Matt,

On Tue, 10 Nov 2015 15:39:23 -0600, Matthew Sellers
<matt at indigo.nu> wrote:

> I am testing CTDB with GlusterFS and having issues with the recovery
> lock functionality.   I am using CTDB with a GlusterFS 3.7.5 mount
> hosting my CTDB_RECOVERY_LOCK ( replica volume type ) .   Once I start
> the service, I aquire the lock, CTDB assigns the VIP and life is
> happy.  Shortly after startup ( 30 sec ), CTDB complains of a slow
> RECLOCK until a gluster error is emitted.  This process repeats itself
> ad-infinitum.  I ran the ping_pong test to flex POSIX locking as
> suggested by other theads of this type and have no errors.  This is
> all with a single CTDB instance, other nodes are left down during my
> testing.
> 
> * Has anyone tried CTDB with modern Gluster and had success?

I don't have any experience with Gluster... but...

> * I am fairly confident this is a gluster bug, due to gluster a
> transport endpoint not connected ( logs below ) .  I will fire a
> message to their list as well, but wanted to ask if anybody on
> #samba-technical has shared my experience.  I have not discovered what
> sequence is causing gluster to fail, thus I have not reported
> yet....more digging required :-)

CTDB is taking a simple fcntl(2) lock and it looks like this isn't
working well.

You could write a simple standalone C program around the following
lines from ctdb/server/ctdb_recover.c:

	struct flock lock;
    ...
	lock.l_type = F_WRLCK;
	lock.l_whence = SEEK_SET;
	lock.l_start = 0;
	lock.l_len = 1;
	lock.l_pid = 0;

	if (fcntl(ctdb->recovery_lock_fd, F_SETLK, &lock) != 0) {

This would allow you to test the fcntl(2) locking without having all of
CTDB in the way.

If you're not a C programmer (though I think this could probably be
done with Python too) then please yell and I'll whip something up for
you to test with...  :-)

peace & happiness,
martin