Debugging two simple ctdb problems

Fri May 20 19:17:54 UTC 2016

Hi Steve,

On Fri, 20 May 2016 10:31:59 -0500, Steve French
<smfrench at gmail.com> wrote:

> In looking at a simple ctdb two node configuration (two Samba servers
> exporting the same clustered share), I ran into two hopefully easy to
> debug problems - but would like some suggestions on how to attack
> debugging this type of problem.
> 
> After setting up the /etc/ctdb/nodes files on each server (with the ip
> addresses of the 2nd NIC for each of the two servers) and an empty
> public addresses file (and updating smb.conf to enable clustering), I
> started ctdb on each - and saw ctdb status display both nodes as
> healthy and ctdb getdbmap showed locking.tdb and brlock.tdb but I
> didn't see deny-write enforced (I mounted r: to \\server1\share and s:
> to \\server2\share from Windows then I opened deny shared write and
> wrote to the same file ie r:\file and s:\file at the same time - which
> shouldn't have worked, and doesn't if both opens are to the same samba
> server).  Any suggestions on what to look for in debugging this simple
> issue?

This seems like it must be a very basic configuration issue.

Just to double-check, if you do the above repeatedly and run "ctdb
dbstatistics locking.tdb" then do you see evidence of record migrations?

> Later the VM for one of the two Samba servers got corrupted, so I
> installed a 3rd system, updated the /etc/ctdb/nodes files to remove
> the unhealth one, and include the ip addresses for only the two
> healthy servers, then restarted ctdb on the two healthy nodes - but
> ctdb node status always continues to show one of the two as unhealthy.
> They can ping each other. Any ideas what to look for in debugging
> this?
> 
> Rebooting both systems, the systems come up, ctdb and samba starts as
> expected, but ctdb status shows unhealthy for both now.
> 
> Ideas on how to approach debugging unhealthy ctdb nodes that can ping
> each other, and non-enforcement of deny-modes in a healthy ctdb
> cluster?

"ctdb scriptstatus" will show you what is failing.  If it says "Monitor
cycle never run" then try "ctdb scriptstatus startup", since CTDB might
be indefinitely looping while retrying the "startup" event.

peace & happiness,
martin