CTDB and Glusterfs setup

Mon Oct 8 02:19:18 MDT 2012

Hi Morten,

On 2012-10-08 at 07:28 +0000, Morten Bøhmer wrote:
> Hi all
> 
> I am working on a ctdb setup and would like to use Glusterfs for a shared volume for ctdb.
> 
> I have setup a simple replicated volume with glusterfs:
> 
> gluster volume create lock replicate 2 lock 172.16.0.1:/mnt/gluster/lock 172.16.0.2:/mnt/gluster/lock
> 
> I have started this volume and can mount it successfully on /mnt/lock
> 
> I have configured files for ctdb;
> 
> [root at gluster1 lock]# ls -la
> total 48
> drwxr-xr-x  2 root root 4096 Oct  7 23:09 .
> drwxr-xr-x. 4 root root 4096 Oct  7 00:21 ..
> -rw-r--r--  1 root root  165 Oct  7 23:09 ctdb
> -rw-r--r--  1 root root   22 Oct  7 23:09 nodes
> -rw-r--r--  1 root root   42 Oct  7 23:09 public_addresses
> -rw-r--r--  1 root root  417 Oct  7 23:09 smb.conf
> -rw-------  1 root root    3 Oct  7 23:09 test.dat
> 
> And tried to run the ping_pong test software:
> 
> [root at gluster1 lock]# /tmp/ping_pong test.dat 1
>   2934 locks/sec
> 
> When I run ping_pong on node 2 while node 1 is  running it, the systems halts and  it is time for a reboot.

Firstly, ping_pong is called "ping_pong <filename> <N>" where N
is at least one bigger than the number of ping_pong processes
that you intend to run. Secondly, all instances of ping_pong
that operate on a given file simultaneously should be called
with the same "N".

If running on several nodes at once and you specify N too small,
then the system should not halt. The ping_pong processes should
block and you should not see a positive locks/sec rate printed
any more.

If the system crashes/halts etc, there is something seriously
wrong with your cluster file systeme (Gluster), be it a bug
or a configuration issue.

You have to configure Gluster for posix fcntl lock support.
Would you share your gluster config?

> Secondly I cannot  get ctdb to start it complains about being
> unable to lock the files in the volume, even though the files
> are fully accessible from both nodes by  using common tools
> like vi,touch,less and so on:

Before you get the ping_pong above running reliably
with multiple processes per node and instances on different
nodes simultaneously, you need not try and get ctdb running.
Ctdb relies on correct posix fcntl locking semantics for the
recovery lock file (unless you disable it, which you should
not do in a production environment unless you want to risk
corrupting your data...). This is precisely what ping_pong
has been written for...

Cheers - Michael

> 2012/10/08 08:59:45.072196 [ 8399]: Starting CTDBD as pid : 8399
> 2012/10/08 08:59:45.072640 [ 8399]: Unable to set scheduler to SCHED_FIFO (Operation not permitted)
> 2012/10/08 08:59:45.207976 [ 8399]: Freeze priority 1
> 2012/10/08 08:59:45.208166 [ 8399]: Freeze priority 2
> 2012/10/08 08:59:45.208286 [ 8399]: Freeze priority 3
> 2012/10/08 08:59:48.212279 [recoverd: 8451]: Taking out recovery lock from recovery daemon
> 2012/10/08 08:59:48.212553 [recoverd: 8451]: Take the recovery lock
> 2012/10/08 08:59:48.212731 [recoverd: 8451]: ctdb_recovery_lock: Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' --public-addresses='/mnt/lock/public_addresses' - (No such file or directory)
> 2012/10/08 08:59:48.212850 [recoverd: 8451]: Unable to get recovery lock - aborting recovery
> 2012/10/08 08:59:49.213604 [recoverd: 8451]: Taking out recovery lock from recovery daemon
> 2012/10/08 08:59:49.213823 [recoverd: 8451]: Take the recovery lock
> 2012/10/08 08:59:49.213975 [recoverd: 8451]: ctdb_recovery_lock: Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' --public-addresses='/mnt/lock/public_addresses' - (No such file or directory)
> 2012/10/08 08:59:49.214095 [recoverd: 8451]: Unable to get recovery lock - aborting recovery
> 2012/10/08 08:59:50.214874 [recoverd: 8451]: Taking out recovery lock from recovery daemon
> 2012/10/08 08:59:50.215338 [recoverd: 8451]: Take the recovery lock
> 2012/10/08 08:59:50.215616 [recoverd: 8451]: ctdb_recovery_lock: Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' --public-addresses='/mnt/lock/public_addresses' - (No such file or directory)
> 2012/10/08 08:59:50.215727 [recoverd: 8451]: Unable to get recovery lock - aborting recovery
> 2012/10/08 08:59:51.216524 [recoverd: 8451]: Taking out recovery lock from recovery daemon
> 2012/10/08 08:59:51.216749 [recoverd: 8451]: Take the recovery lock
> 2012/10/08 08:59:51.216906 [recoverd: 8451]: ctdb_recovery_lock: Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' --public-addresses='/mnt/lock/public_addresses' - (No such file or directory)
> 2012/10/08 08:59:51.217030 [recoverd: 8451]: Unable to get recovery lock - aborting recovery
> 2012/10/08 08:59:52.217836 [ 8399]: Banning this node for 300 seconds
> 2012/10/08 08:59:52.218095 [recoverd: 8451]: Taking out recovery lock from recovery daemon
> 2012/10/08 08:59:52.218228 [recoverd: 8451]: Take the recovery lock
> 2012/10/08 08:59:52.218359 [recoverd: 8451]: ctdb_recovery_lock: Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' --public-addresses='/mnt/lock/public_addresses' - (No such file or directory)
> 2012/10/08 08:59:52.218519 [recoverd: 8451]: Unable to get recovery lock - aborting recovery
> 
> 
> I have done tests with both FC17+stock Glusterfs and with 3.3 version from RPM. Anyone got a clue of how to get this up and running ?
> 
> 
> Morten

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 206 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20121008/e5d6ca27/attachment.pgp>