SV: CTDB and Glusterfs setup

Mon Oct 8 07:35:00 MDT 2012

Please see attached.

I have placed all ctdb related config files shared area and linked them to the proper places, since they will be the same. If this is not a good idea I will maintain them locally. 

Please see attached, please note that symlinks are not in the tar file.

morten

________________________________________
From: Michael Adam [obnox at samba.org]
Sent: Monday, October 08, 2012 3:18 PM
To: Morten Bøhmer
Cc: samba-technical at lists.samba.org
Subject: Re: SV: CTDB and Glusterfs setup

Cheers - Michael

On 2012-10-08 at 10:00 +0000, Morten Bøhmer wrote:
> Ok, have been working some more.
>
> Ping_pong now works just fine, on the two node setup I run:
>
> Ping_pong /mnt/lock/test.dat 3.
>
> Gives me abt 1500/sec on each node.

Does this change (i.e. drop) when adding processes?

> When running with -rw the documentation  says that it should increment the "data increment" value, but it stays on "1" even when second node is running ping_poing with -rw.

Then posix locking is not implemented completely/correctly in
your Gluster setup.

Do you get data increments by doing multiple processes on a
single node?

Could you send your gluster config?

> The ctdb still gives me the same error message abt missing files.

While you still need to fix the above before proceeding,
I noted the following:

> > 2012/10/08 08:59:49.213975 [recoverd: 8451]: ctdb_recovery_lock: Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' --public-addresses='/mnt/lock/public_addresses' - (No such file or directory)

This looks as if there is a problem with the lock file setting
in the config:

The debug message shows that ctdb believes the name of the
recovery lock file to be "'/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' --public-addresses='/mnt/lock/public_addresses'"
... :-o

I.e. there seems to be a problem with the file name setting in
ctdb config file.

(the ctdb init-script builds these command line options from
/etc/sysconfig/ctdb)

So, could you please attach (in addition to your gluster config)
your ctdb config: /etc/sysconfig/ctdb , the nodes and the
public_addresses file.

Note btw, that you should in general not put the nodes and
public_addresses files into the cluster file system
ctdb should use a local /etc/ctdb directory.

Cheers - Michael

> -----Opprinnelig melding-----
> Fra: Michael Adam [mailto:obnox at samba.org]
> Sendt: 8. oktober 2012 10:19
> Til: Morten Bøhmer
> Kopi: samba-technical at lists.samba.org
> Emne: Re: CTDB and Glusterfs setup
>
> Hi Morten,
>
> On 2012-10-08 at 07:28 +0000, Morten Bøhmer wrote:
> > Hi all
> >
> > I am working on a ctdb setup and would like to use Glusterfs for a shared volume for ctdb.
> >
> > I have setup a simple replicated volume with glusterfs:
> >
> > gluster volume create lock replicate 2 lock
> > 172.16.0.1:/mnt/gluster/lock 172.16.0.2:/mnt/gluster/lock
> >
> > I have started this volume and can mount it successfully on /mnt/lock
> >
> > I have configured files for ctdb;
> >
> > [root at gluster1 lock]# ls -la
> > total 48
> > drwxr-xr-x  2 root root 4096 Oct  7 23:09 .
> > drwxr-xr-x. 4 root root 4096 Oct  7 00:21 ..
> > -rw-r--r--  1 root root  165 Oct  7 23:09 ctdb
> > -rw-r--r--  1 root root   22 Oct  7 23:09 nodes
> > -rw-r--r--  1 root root   42 Oct  7 23:09 public_addresses
> > -rw-r--r--  1 root root  417 Oct  7 23:09 smb.conf
> > -rw-------  1 root root    3 Oct  7 23:09 test.dat
> >
> > And tried to run the ping_pong test software:
> >
> > [root at gluster1 lock]# /tmp/ping_pong test.dat 1
> >   2934 locks/sec
> >
> > When I run ping_pong on node 2 while node 1 is  running it, the systems halts and  it is time for a reboot.
>
> Firstly, ping_pong is called "ping_pong <filename> <N>" where N is at least one bigger than the number of ping_pong processes that you intend to run. Secondly, all instances of ping_pong that operate on a given file simultaneously should be called with the same "N".
>
> If running on several nodes at once and you specify N too small, then the system should not halt. The ping_pong processes should block and you should not see a positive locks/sec rate printed any more.
>
> If the system crashes/halts etc, there is something seriously wrong with your cluster file systeme (Gluster), be it a bug or a configuration issue.
>
> You have to configure Gluster for posix fcntl lock support.
> Would you share your gluster config?
>
> > Secondly I cannot  get ctdb to start it complains about being unable
> > to lock the files in the volume, even though the files are fully
> > accessible from both nodes by  using common tools like vi,touch,less
> > and so on:
>
> Before you get the ping_pong above running reliably with multiple processes per node and instances on different nodes simultaneously, you need not try and get ctdb running.
> Ctdb relies on correct posix fcntl locking semantics for the recovery lock file (unless you disable it, which you should not do in a production environment unless you want to risk corrupting your data...). This is precisely what ping_pong has been written for...
>
> Cheers - Michael
>
> > 2012/10/08 08:59:45.072196 [ 8399]: Starting CTDBD as pid : 8399
> > 2012/10/08 08:59:45.072640 [ 8399]: Unable to set scheduler to
> > SCHED_FIFO (Operation not permitted)
> > 2012/10/08 08:59:45.207976 [ 8399]: Freeze priority 1
> > 2012/10/08 08:59:45.208166 [ 8399]: Freeze priority 2
> > 2012/10/08 08:59:45.208286 [ 8399]: Freeze priority 3
> > 2012/10/08 08:59:48.212279 [recoverd: 8451]: Taking out recovery lock
> > from recovery daemon
> > 2012/10/08 08:59:48.212553 [recoverd: 8451]: Take the recovery lock
> > 2012/10/08 08:59:48.212731 [recoverd: 8451]: ctdb_recovery_lock:
> > Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes'
> > --public-addresses='/mnt/lock/public_addresses' - (No such file or
> > directory)
> > 2012/10/08 08:59:48.212850 [recoverd: 8451]: Unable to get recovery
> > lock - aborting recovery
> > 2012/10/08 08:59:49.213604 [recoverd: 8451]: Taking out recovery lock
> > from recovery daemon
> > 2012/10/08 08:59:49.213823 [recoverd: 8451]: Take the recovery lock
> > 2012/10/08 08:59:49.213975 [recoverd: 8451]: ctdb_recovery_lock:
> > Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes'
> > --public-addresses='/mnt/lock/public_addresses' - (No such file or
> > directory)
> > 2012/10/08 08:59:49.214095 [recoverd: 8451]: Unable to get recovery
> > lock - aborting recovery
> > 2012/10/08 08:59:50.214874 [recoverd: 8451]: Taking out recovery lock
> > from recovery daemon
> > 2012/10/08 08:59:50.215338 [recoverd: 8451]: Take the recovery lock
> > 2012/10/08 08:59:50.215616 [recoverd: 8451]: ctdb_recovery_lock:
> > Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes'
> > --public-addresses='/mnt/lock/public_addresses' - (No such file or
> > directory)
> > 2012/10/08 08:59:50.215727 [recoverd: 8451]: Unable to get recovery
> > lock - aborting recovery
> > 2012/10/08 08:59:51.216524 [recoverd: 8451]: Taking out recovery lock
> > from recovery daemon
> > 2012/10/08 08:59:51.216749 [recoverd: 8451]: Take the recovery lock
> > 2012/10/08 08:59:51.216906 [recoverd: 8451]: ctdb_recovery_lock:
> > Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes'
> > --public-addresses='/mnt/lock/public_addresses' - (No such file or
> > directory)
> > 2012/10/08 08:59:51.217030 [recoverd: 8451]: Unable to get recovery
> > lock - aborting recovery
> > 2012/10/08 08:59:52.217836 [ 8399]: Banning this node for 300 seconds
> > 2012/10/08 08:59:52.218095 [recoverd: 8451]: Taking out recovery lock
> > from recovery daemon
> > 2012/10/08 08:59:52.218228 [recoverd: 8451]: Take the recovery lock
> > 2012/10/08 08:59:52.218359 [recoverd: 8451]: ctdb_recovery_lock:
> > Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes'
> > --public-addresses='/mnt/lock/public_addresses' - (No such file or
> > directory)
> > 2012/10/08 08:59:52.218519 [recoverd: 8451]: Unable to get recovery
> > lock - aborting recovery
> >
> >
> > I have done tests with both FC17+stock Glusterfs and with 3.3 version from RPM. Anyone got a clue of how to get this up and running ?
> >
> >
> > Morten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lock.vol.tar
Type: application/x-tar
Size: 10240 bytes
Desc: lock.vol.tar
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20121008/5ddffc73/attachment.tar>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ctdb.tar
Type: application/x-tar
Size: 10240 bytes
Desc: ctdb.tar
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20121008/5ddffc73/attachment-0001.tar>