[PATCH] Fix race conditions in CTDB recovery lock (bug 13617)

Mon Sep 17 03:47:14 UTC 2018

On Fri, Sep 14, 2018 at 3:22 PM, Martin Schwenke via samba-technical
<samba-technical at lists.samba.org> wrote:
> The initial issue here is that if a node starts taking the recovery
> lock and loses an election then it can continue to hold the recovery
> lock.  This is because the lock is released in the election loss code
> and an in-progress lock attempt is currently not considered for
> release.  The result is that the new master will be unable to take the
> lock, will be banned and the cluster ends up in bad shape.
>
> This is a regression that was introduced when I added the generalised
> cluster mutex code 2.5 years ago.  It was masked by the use of
> ctdbd_wrapper to start local daemons, which adds latency between daemon
> starts, and noticed recently by Volker when he didn't do that.  Others
> have also seen this issue occasionally.
>
> The solution is to allow the recovery lock release code to cancel an
> in-progress attempt to take the lock.  This requires a little bit of
> restructuring.
>
> Another issue is that the cluster mutex child effectively drops SIGTERM
> between the fork and when the desired helper is exec()ed.  This is
> because the recovery daemon uses a tevent-based signal handler and the
> child never runs an event loop.  This was only really triggered by
> the above in-progress cancellation changes.  The solution is to block
> SIGTERM around fork() and reset the SIGTERM handler in the child before
> unblocking the signal.
>
> The recovery lock code can now survive thousands of iterations of
> ctdb/tests/simple/43_stop_recmaster_yield.sh, which exercises both the
> fight for the recovery master role when several daemons are started
> "simultaneously", and the transfer of the role between 2 nodes due to a
> node becoming inactive.
>
> Please review and maybe push...

Pushed.

Amitay.