CTDB and Recovery lock

Partha Sarathi parthasarathi.bl at gmail.com
Tue Nov 17 19:14:44 UTC 2015


Hi,

I was going through the recovery process details given in the
Recovery-process.txt and got few questions on the below points listed in it.

 RECOVERY MASTER CLUSTER MONITORING
-----------------------------------

17, Verify that the filehandle to the recovery lock file is valid. If it is
not, this may mean a split brain and is a critical error.
    Try a new recovery and restart monitoring from 1.
    "recovery master doesn't have the recovery lock"
18, Verify that GPFS allows us to read from the recovery lock file. If not
there is a critical GPFS issue and we may have a split brain.
    Try forcing a new recovery and restart monitoring from 1.
    "failed read from recovery_lock_fd - %s"

 CLUSTER RECOVERY
----------------------------------

3, Verify that the recovery daemon can lock the recovery lock file. At this
stage this should be recovery master.
   If this operation fails it means we have a split brain and have to abort
recovery. "("ctdb_recovery_lock: Unable to open %s - (%s)"
   "ctdb_recovery_lock: Failed to get recovery lock on '%s'"
   "Unable to get recovery lock - aborting recovery" "ctdb_recovery_lock:
Got recovery lock on '%s'"


Questions
--------------
1)  Does recovery master always holds the exclusive lock on the
recovery_lock_file  or it only take out exclusive lock upon recovery
initiated by local recoverd of recmaster ?

2)  After verifying the  recovery lock file is valid, why are we reading
from lock file ? I didn't understand the intension behind it.

3) When recmaster releases the exclusive lock on recovery lock file

4) If the recmaster holding a exclusive lock on a recovery lock file and
which disconnected for some reason without updating the flags
DISCONNECTED/UNHEALTHY , how do the other nodes get to know its status and
starts the force election for remaster role.

Also we don't have our Clustered Filesystem with Posix locking support but
we do have Consensus/Conspiracy service available in cluster, is it
possible to make use of that infrastructure to mimic the recovery file lock
mechanism ?

Your answers and suggestions are very much helpful here.

Regards,
--Partha


More information about the samba-technical mailing list