[Samba] Samba in Pacemaker-Cluster: CTDB fails to get recovery lock

Fri Mar 11 06:13:59 MST 2011

I'm currently testing fail-over with a two-node active-active cluster 
(with node dig and node dag): Both nodes are up, one is manually killed. 
CTDB on the node that's still alive should perform a recovery and 
everything should working again.

What's infrequently happening is:

After killing the pacemaker-process on dag (and dag consequently being 
fenced), dig's CTDB tries to get the recovery lock and fails. As there 
is no other node online to get the recovery lock and thus finishing 
CTDB's recovery, dig's CTDB keeps trying to get the recovery lock until 
manually stopped.
The only way to get CTDB back to work is to restart OCFS2's distributed 
lock manager.

logfiles and pacemaker-configuration are attached, any help would be 
greatly appreciated :)

Regards,
Uwe

Our setting:

two nodes directly connected via LAN running openSuse 11.3 and sharing a 
SAN-drive that is connected via two interfaces using multipath.

pacemaker 1.1.2
corosync 1.2.1
cluster-glue 1.0.5-1.4
ctdb 1.0.114-2.20
ocfs2 1.4.3-1.4
multipath 0.4.8-51.3

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: crm.config
URL: <http://lists.samba.org/pipermail/samba/attachments/20110311/024ac44a/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log.ctdb
URL: <http://lists.samba.org/pipermail/samba/attachments/20110311/024ac44a/attachment-0001.ksh>