Setting up CTDB on OCFS2 and VMs ...

Tue Dec 16 13:59:36 MST 2014

On Tue, 16 Dec 2014 18:22:02 +0000, Rowland Penny
<repenny241155 at gmail.com> wrote:

> I dont think so, tailing the log shows this:
> 
> root at cluster1:~# tail /var/log/ctdb/log.ctdb
> 2014/12/16 18:11:23.866612 [13513]: Thawing priority 2
> 2014/12/16 18:11:23.866634 [13513]: Release freeze handler for prio 2
> 2014/12/16 18:11:23.866666 [13513]: Thawing priority 3
> 2014/12/16 18:11:23.866685 [13513]: Release freeze handler for prio 3
> 2014/12/16 18:11:23.873189 [recoverd:13666]: ctdb_control error: 
> 'managed to lock reclock file from inside daemon'
> 2014/12/16 18:11:23.873235 [recoverd:13666]: ctdb_control error: 
> 'managed to lock reclock file from inside daemon'
> 2014/12/16 18:11:23.873246 [recoverd:13666]: Async operation failed with 
> ret=-1 res=-1 opcode=16
> 2014/12/16 18:11:23.873254 [recoverd:13666]: Async wait failed - 
> fail_count=1
> 2014/12/16 18:11:23.873261 [recoverd:13666]: server/ctdb_recoverd.c:412 
> Unable to set recovery mode. Recovery failed.
> 2014/12/16 18:11:23.873268 [recoverd:13666]: server/ctdb_recoverd.c:1996 
> Unable to set recovery mode to normal on cluster
> 
> This appears to be happening over and over again.

That is the indicator that you have a lock coherency problem.  Please
see the stuff I made bold in:

  https://wiki.samba.org/index.php/Ping_pong

Yes, this is hard and it tripped me up when I rushed through the
ping-pong test...  and there was nothing in bold there to draw my
attention to that detail. As Michael Adam has mentioned, some cluster
filesystems will look like they fail this test when they actually pass,
so it is difficult to have a test that works everywhere...

I'll try to update that message to make this clearer and send users
back to the ping-pong test.

peace & happiness,
martin