[PATCH] CTDB recovery lock improvements

Mon Jun 6 05:08:11 UTC 2016

On Fri, Jun 3, 2016 at 2:31 PM, Martin Schwenke <martin at meltin.net> wrote:

> Several improvements to the CTDB recovery lock and cluster mutex code:
>
> * Remove "ctdb setreclock" command and supporting protocol/other code
>
>   This can't be safely updated at run-time.  If it fails on a node
>   then you probably need to take CTDB down on all nodes to be
>   completely sure that you can get back into a consistent state.  So,
>   even though this exists to avoid an outage, you would need to
>   schedule a service window anyway, just in case!
>
>   For those who think they really want to be able to update the
>   recovery lock at run-time, set CTDB_RECOVERY_LOCK to a script and
>   modify the script at run-time... but don't complain if bad things
>   happen.  :-)
>
>   Note that this came up when Amitay and I were discussing the
>   introduction of a cluster lock, which is always held by the master
>   and has nothing to do with recovery.  Should we add the ability to
>   update the cluster lock?  No, as above...
>
>   This is the first 10 commits.
>
> * Cluster mutex improvements/simplifications:
>
>   - Pass a talloc context to allocate the handle off, instead of
>     always allocating off CTDB context
>
>   - ctdb_cluster_mutex() register handler and private data
>
>   - ctdb_cluster_mutex() register extra handler (and corresponding
>     private data) to call when cluster mutex helper terminates
>     unexpectedly.
>
> * Allocate recovery lock handle off recovery daemon context
>
>   CTDB context doesn't need this, since the lock is only ever taken in
>   the recovery daemon.
>
> * Add a SIGTERM handler to the recovery daemon that releases the
>   recovery lock
>
>   Before the recovery lock was handled by a helper, it was held by the
>   recovery daemon itself.  Now, a helper can take a few seconds to
>   notice the recovery daemon has exited, before it releases the lock
>   and exits.  Main ctdbd shuts down recovery daemon with SIGTERM, so
>   catch it, release lock, exit.
>
> Please review and maybe push...
>
> peace & happiness,
> martin
>

Pushed with additional patch to fix #endif decoration in ctdb_mutex_lock.h.

Amitay.