SV: SV: CTDB and Glusterfs setup

Morten Bøhmer Morten.Bohmer at pilaro.no
Mon Oct 8 07:37:22 MDT 2012


Sorry forgot to tell about ping_pong

It increases the data increment value when running several prosesses on the same node, but not when running on different nodes.

Morten

-----Opprinnelig melding-----
Fra: Michael Adam [mailto:obnox at samba.org] 
Sendt: 8. oktober 2012 15:18
Til: Morten Bøhmer
Kopi: samba-technical at lists.samba.org
Emne: Re: SV: CTDB and Glusterfs setup

Cheers - Michael

On 2012-10-08 at 10:00 +0000, Morten Bøhmer wrote:
> Ok, have been working some more.
> 
> Ping_pong now works just fine, on the two node setup I run:
> 
> Ping_pong /mnt/lock/test.dat 3.
> 
> Gives me abt 1500/sec on each node. 

Does this change (i.e. drop) when adding processes?

> When running with -rw the documentation  says that it should increment the "data increment" value, but it stays on "1" even when second node is running ping_poing with -rw.

Then posix locking is not implemented completely/correctly in your Gluster setup.

Do you get data increments by doing multiple processes on a single node?

Could you send your gluster config?

> The ctdb still gives me the same error message abt missing files.

While you still need to fix the above before proceeding, I noted the following:

> > 2012/10/08 08:59:49.213975 [recoverd: 8451]: ctdb_recovery_lock: 
> > Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' 
> > --public-addresses='/mnt/lock/public_addresses' - (No such file or 
> > directory)

This looks as if there is a problem with the lock file setting in the config:

The debug message shows that ctdb believes the name of the recovery lock file to be "'/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' --public-addresses='/mnt/lock/public_addresses'"
... :-o

I.e. there seems to be a problem with the file name setting in ctdb config file.

(the ctdb init-script builds these command line options from
/etc/sysconfig/ctdb)

So, could you please attach (in addition to your gluster config) your ctdb config: /etc/sysconfig/ctdb , the nodes and the public_addresses file.

Note btw, that you should in general not put the nodes and public_addresses files into the cluster file system ctdb should use a local /etc/ctdb directory.

Cheers - Michael

> -----Opprinnelig melding-----
> Fra: Michael Adam [mailto:obnox at samba.org]
> Sendt: 8. oktober 2012 10:19
> Til: Morten Bøhmer
> Kopi: samba-technical at lists.samba.org
> Emne: Re: CTDB and Glusterfs setup
> 
> Hi Morten,
> 
> On 2012-10-08 at 07:28 +0000, Morten Bøhmer wrote:
> > Hi all
> > 
> > I am working on a ctdb setup and would like to use Glusterfs for a shared volume for ctdb.
> > 
> > I have setup a simple replicated volume with glusterfs:
> > 
> > gluster volume create lock replicate 2 lock 
> > 172.16.0.1:/mnt/gluster/lock 172.16.0.2:/mnt/gluster/lock
> > 
> > I have started this volume and can mount it successfully on 
> > /mnt/lock
> > 
> > I have configured files for ctdb;
> > 
> > [root at gluster1 lock]# ls -la
> > total 48
> > drwxr-xr-x  2 root root 4096 Oct  7 23:09 .
> > drwxr-xr-x. 4 root root 4096 Oct  7 00:21 ..
> > -rw-r--r--  1 root root  165 Oct  7 23:09 ctdb
> > -rw-r--r--  1 root root   22 Oct  7 23:09 nodes
> > -rw-r--r--  1 root root   42 Oct  7 23:09 public_addresses
> > -rw-r--r--  1 root root  417 Oct  7 23:09 smb.conf
> > -rw-------  1 root root    3 Oct  7 23:09 test.dat
> > 
> > And tried to run the ping_pong test software:
> > 
> > [root at gluster1 lock]# /tmp/ping_pong test.dat 1
> >   2934 locks/sec
> > 
> > When I run ping_pong on node 2 while node 1 is  running it, the systems halts and  it is time for a reboot.
> 
> Firstly, ping_pong is called "ping_pong <filename> <N>" where N is at least one bigger than the number of ping_pong processes that you intend to run. Secondly, all instances of ping_pong that operate on a given file simultaneously should be called with the same "N".
> 
> If running on several nodes at once and you specify N too small, then the system should not halt. The ping_pong processes should block and you should not see a positive locks/sec rate printed any more.
> 
> If the system crashes/halts etc, there is something seriously wrong with your cluster file systeme (Gluster), be it a bug or a configuration issue.
> 
> You have to configure Gluster for posix fcntl lock support.
> Would you share your gluster config?
> 
> > Secondly I cannot  get ctdb to start it complains about being unable 
> > to lock the files in the volume, even though the files are fully 
> > accessible from both nodes by  using common tools like vi,touch,less 
> > and so on:
> 
> Before you get the ping_pong above running reliably with multiple processes per node and instances on different nodes simultaneously, you need not try and get ctdb running.
> Ctdb relies on correct posix fcntl locking semantics for the recovery lock file (unless you disable it, which you should not do in a production environment unless you want to risk corrupting your data...). This is precisely what ping_pong has been written for...
> 
> Cheers - Michael
> 
> > 2012/10/08 08:59:45.072196 [ 8399]: Starting CTDBD as pid : 8399
> > 2012/10/08 08:59:45.072640 [ 8399]: Unable to set scheduler to 
> > SCHED_FIFO (Operation not permitted)
> > 2012/10/08 08:59:45.207976 [ 8399]: Freeze priority 1
> > 2012/10/08 08:59:45.208166 [ 8399]: Freeze priority 2
> > 2012/10/08 08:59:45.208286 [ 8399]: Freeze priority 3
> > 2012/10/08 08:59:48.212279 [recoverd: 8451]: Taking out recovery 
> > lock from recovery daemon
> > 2012/10/08 08:59:48.212553 [recoverd: 8451]: Take the recovery lock
> > 2012/10/08 08:59:48.212731 [recoverd: 8451]: ctdb_recovery_lock: 
> > Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' 
> > --public-addresses='/mnt/lock/public_addresses' - (No such file or
> > directory)
> > 2012/10/08 08:59:48.212850 [recoverd: 8451]: Unable to get recovery 
> > lock - aborting recovery
> > 2012/10/08 08:59:49.213604 [recoverd: 8451]: Taking out recovery 
> > lock from recovery daemon
> > 2012/10/08 08:59:49.213823 [recoverd: 8451]: Take the recovery lock
> > 2012/10/08 08:59:49.213975 [recoverd: 8451]: ctdb_recovery_lock: 
> > Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' 
> > --public-addresses='/mnt/lock/public_addresses' - (No such file or
> > directory)
> > 2012/10/08 08:59:49.214095 [recoverd: 8451]: Unable to get recovery 
> > lock - aborting recovery
> > 2012/10/08 08:59:50.214874 [recoverd: 8451]: Taking out recovery 
> > lock from recovery daemon
> > 2012/10/08 08:59:50.215338 [recoverd: 8451]: Take the recovery lock
> > 2012/10/08 08:59:50.215616 [recoverd: 8451]: ctdb_recovery_lock: 
> > Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' 
> > --public-addresses='/mnt/lock/public_addresses' - (No such file or
> > directory)
> > 2012/10/08 08:59:50.215727 [recoverd: 8451]: Unable to get recovery 
> > lock - aborting recovery
> > 2012/10/08 08:59:51.216524 [recoverd: 8451]: Taking out recovery 
> > lock from recovery daemon
> > 2012/10/08 08:59:51.216749 [recoverd: 8451]: Take the recovery lock
> > 2012/10/08 08:59:51.216906 [recoverd: 8451]: ctdb_recovery_lock: 
> > Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' 
> > --public-addresses='/mnt/lock/public_addresses' - (No such file or
> > directory)
> > 2012/10/08 08:59:51.217030 [recoverd: 8451]: Unable to get recovery 
> > lock - aborting recovery
> > 2012/10/08 08:59:52.217836 [ 8399]: Banning this node for 300 
> > seconds
> > 2012/10/08 08:59:52.218095 [recoverd: 8451]: Taking out recovery 
> > lock from recovery daemon
> > 2012/10/08 08:59:52.218228 [recoverd: 8451]: Take the recovery lock
> > 2012/10/08 08:59:52.218359 [recoverd: 8451]: ctdb_recovery_lock: 
> > Unable to open '/mnt/lock/lockfile' --nlist='/mnt/lock/nodes' 
> > --public-addresses='/mnt/lock/public_addresses' - (No such file or
> > directory)
> > 2012/10/08 08:59:52.218519 [recoverd: 8451]: Unable to get recovery 
> > lock - aborting recovery
> > 
> > 
> > I have done tests with both FC17+stock Glusterfs and with 3.3 version from RPM. Anyone got a clue of how to get this up and running ?
> > 
> > 
> > Morten


More information about the samba-technical mailing list