[Samba] Samba in Pacemaker-Cluster: CTDB fails to get recovery lock

Jim McDonough jmcd at samba.org
Mon Mar 14 12:23:03 MDT 2011

On Fri, Mar 11, 2011 at 8:13 AM, Uwe Ritzschke
<uwe.ritzschke.2 at cms.hu-berlin.de> wrote:
> I'm currently testing fail-over with a two-node active-active cluster (with
> node dig and node dag): Both nodes are up, one is manually killed. CTDB on
> the node that's still alive should perform a recovery and everything should
> working again.
> What's infrequently happening is:
> After killing the pacemaker-process on dag (and dag consequently being
> fenced), dig's CTDB tries to get the recovery lock and fails. As there is no
> other node online to get the recovery lock and thus finishing CTDB's
> recovery, dig's CTDB keeps trying to get the recovery lock until manually
> stopped.
> The only way to get CTDB back to work is to restart OCFS2's distributed lock
> manager.
> Our setting:
> two nodes directly connected via LAN running openSuse 11.3 and sharing a
> SAN-drive that is connected via two interfaces using multipath.
> pacemaker 1.1.2
> corosync 1.2.1
> cluster-glue 1.0.5-1.4
> ctdb 1.0.114-2.20
> ocfs2 1.4.3-1.4
> multipath 0.4.8-51.3
You might want to try updated packages from the repository:

This would give you newer code levels on the HA packages.

Jim McDonough
Samba Team
SUSE labs
jmcd at samba dot org
jmcd at themcdonoughs dot org

More information about the samba mailing list