CTDB and Recovery lock

Tue Nov 17 22:00:11 UTC 2015

Thanks a lot Martin for your detailed answers.

Now I am more interested about the "configurable helper program" you
mentioned  which replaces the recovery lock mechanism. May I know little
more about it ?.

Thanks,
--Partha

On Tue, Nov 17, 2015 at 1:04 PM, Martin Schwenke <martin at meltin.net> wrote:

> Hi Partha,
>
> On Tue, 17 Nov 2015 11:14:44 -0800, Partha Sarathi
> <parthasarathi.bl at gmail.com> wrote:
>
> > I was going through the recovery process details given in the
> > Recovery-process.txt and got few questions on the below points listed in
> it.
> >
> >  RECOVERY MASTER CLUSTER MONITORING
> > -----------------------------------
> >
> > 17, Verify that the filehandle to the recovery lock file is valid. If it
> is
> > not, this may mean a split brain and is a critical error.
> >     Try a new recovery and restart monitoring from 1.
> >     "recovery master doesn't have the recovery lock"
> > 18, Verify that GPFS allows us to read from the recovery lock file. If
> not
> > there is a critical GPFS issue and we may have a split brain.
> >     Try forcing a new recovery and restart monitoring from 1.
> >     "failed read from recovery_lock_fd - %s"
> >
> >  CLUSTER RECOVERY
> > ----------------------------------
> >
> > 3, Verify that the recovery daemon can lock the recovery lock file. At
> this
> > stage this should be recovery master.
> >    If this operation fails it means we have a split brain and have to
> abort
> > recovery. "("ctdb_recovery_lock: Unable to open %s - (%s)"
> >    "ctdb_recovery_lock: Failed to get recovery lock on '%s'"
> >    "Unable to get recovery lock - aborting recovery" "ctdb_recovery_lock:
> > Got recovery lock on '%s'"
> >
> >
> > Questions
> > --------------
>
> I'll reorder them a bit to answer common things together.
>
> > 1)  Does recovery master always holds the exclusive lock on the
> > recovery_lock_file  or it only take out exclusive lock upon recovery
> > initiated by local recoverd of recmaster ?
>
> At the moment the lock is a combination master+recovery lock.
>
> It is taken by a new recovery master at the beginning of its first
> recovery.
>
> > 3) When recmaster releases the exclusive lock on recovery lock file
>
> It is released if the node loses an election.
>
> In future we we might use 2 locks with clearer semantics.  One would be
> the master lock - it would be taken when an election is won and would
> be released when an election is lost.  The other would be the recovery
> lock - it would be taken at the beginning of recovery and released at
> the end.  This involves untangling the election and recovery process,
> which I have started doing in recent weeks.
>
> > 2)  After verifying the  recovery lock file is valid, why are we reading
> > from lock file ? I didn't understand the intension behind it.
>
> We don't do this anymore and we need to update the documentation.  If I
> remember correctly, the node number of the recovery master node was
> written into the lock file so that it could be verified.  Checking this
> often caused problems when the cluster filesystem performed badly.
>
> > 4) If the recmaster holding a exclusive lock on a recovery lock file and
> > which disconnected for some reason without updating the flags
> > DISCONNECTED/UNHEALTHY , how do the other nodes get to know its status
> and
> > starts the force election for remaster role.
>
> Nodes will notice when they receive no packets from a node and they
> will mark it as disconnected.  If no other packets are sent between
> nodes then keepalives are sent.
>
> Each recovery daemon checks the status of the current master
> approximately once a second.  If the current master is inactive
> (disconnected, stopped, banned) then an election will be called.
>
> I'm not sure if you're concerned about the following, but...
>
> If CTDB's private network becomes partitioned, with a single node
> unable to communicate with the others, then if that single node is
> holding the recovery lock then there is no sane way of repairing the
> situation.  The remaining nodes will hold elections and when each
> winner is unable to take the lock then it will ban itself... until all
> nodes in the larger partition are banned.
>
> This is a deficiency of the exclusive lock approach compared with a
> quorum-based approach.
>
> > Also we don't have our Clustered Filesystem with Posix locking support
> but
> > we do have Consensus/Conspiracy service available in cluster, is it
> > possible to make use of that infrastructure to mimic the recovery file
> lock
> > mechanism ?
>
> Not today.  Hopefully soon.  As I untangle some of this I will make the
> "exclusive" lock code call out to a configurable helper program.
>
> peace & happiness,
> martin
>

-- 
Thanks & Regards
-Partha