[Samba] ctdb_recovery_lock: Failed to get recovery lock

Nicolas Ecarnot nicolas at ecarnot.net
Tue Mar 27 03:14:16 MDT 2012


Hi,

I'm happily progressing toward the successful setup of my two nodes 
samba cluster : cman, qdisk, clvm, gfs2, ctdb, samba, winbind, ad.
And now, I'm in testing phase.

When my cluster is up and running, I can transfer each ip address toward 
on node or the other, seamlessly.
They can fence each other.

But I still have one big issue : though they have been setup as clones, 
they don't behave identically : when shutting down node 1, node 0 takes 
over every part of ctdb setup (ip, recmaster, services).
But when I stop ctdb daemon on node 1, though ctdb node 0 correctly 
stops its children daemons (nmbd, smbd and winbind) and kills itself, 
node 1 claims :

ctdb_recovery_lock: Failed to get recovery lock on '/ctdb/.ctdb.lock'

(This directory is clvm + gfs2 shared, writable and correctly accessible 
from both nodes)

This leads node 1 to get banned.
Then, (I guess), when being unbanned, reelection occurs, but I get :

Recmaster node 1 no longer available. Force reelection

I suppose that node 1 can't become recmaster as it can not get the 
recovery lock. But there's no way I see why this node claims it can take 
this lock.

I don't know if this may help, but :
- I removed the lock file, and restarting ctdb recreates it correctly
- Every process is ran as root, who can obviously write in this dir
- I don't know if it is correct, but this file weights zero byte?

Waiting for your advice, I'm heading to reading the source code, in the 
hope I may understand what's wrong.

-- 
Nicolas Ecarnot


More information about the samba mailing list