[Samba] CTDB daemon crashed on bringing down one node in the cluster

Fri May 11 06:52:36 MDT 2012

All,

I have a 3 node CTDB cluster which serves 4 'public addresses'.
/etc/ctdb/public_addresses file is node specific and present in
the above path in participating nodes. All the nodes run RHEL 6.2.

Other ctdb config files such as "nodes" and "public_addresses" are placed
on a shared filesystem mounted on a known location (say, /gluster/lock)

On starting CTDB service in all the nodes, we see things are fine via
ctdb status. All nodes are "OK" and connected.

To test the failover behaviour, I brought down one of the nodes.
"ctdb status" when run on one of the (up) nodes gave the following status,

[root@<nodename>~]# ctdb status
Number of nodes:4
pnn:0 x.y.z.a    DISCONNECTED|BANNED|UNHEALTHY|INACTIVE
pnn:1 x.y.z.b    BANNED|UNHEALTHY|INACTIVE (THIS NODE)
pnn:2 x.y.z.c    DISCONNECTED|UNHEALTHY|INACTIVE
pnn:3 x.y.z.d    OK
Generation:INVALID
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:3
Recovery mode:RECOVERY (1)
Recovery master:3

In the above (edited) output, pnn: 2 is the one that was brought down.
I also observed that ctdb had crashed with signal 6 in pnn: 0. The stack trace
was not very useful. I am new to ctdb, I would like to know if there is anyway
I can get more useful stack traces on subsequent crashes (if any).

Is there something that I may have missed. Could somebody give me pointers how I can
debug this issue? 

cheers,
krish