[PATCH] Fix race conditions in CTDB recovery lock (bug 13617)

Fri Sep 14 05:22:34 UTC 2018

The initial issue here is that if a node starts taking the recovery
lock and loses an election then it can continue to hold the recovery
lock.  This is because the lock is released in the election loss code
and an in-progress lock attempt is currently not considered for
release.  The result is that the new master will be unable to take the
lock, will be banned and the cluster ends up in bad shape.

This is a regression that was introduced when I added the generalised
cluster mutex code 2.5 years ago.  It was masked by the use of
ctdbd_wrapper to start local daemons, which adds latency between daemon
starts, and noticed recently by Volker when he didn't do that.  Others
have also seen this issue occasionally.

The solution is to allow the recovery lock release code to cancel an
in-progress attempt to take the lock.  This requires a little bit of
restructuring.

Another issue is that the cluster mutex child effectively drops SIGTERM
between the fork and when the desired helper is exec()ed.  This is
because the recovery daemon uses a tevent-based signal handler and the
child never runs an event loop.  This was only really triggered by
the above in-progress cancellation changes.  The solution is to block
SIGTERM around fork() and reset the SIGTERM handler in the child before
unblocking the signal.

The recovery lock code can now survive thousands of iterations of
ctdb/tests/simple/43_stop_recmaster_yield.sh, which exercises both the
fight for the recovery master role when several daemons are started
"simultaneously", and the transfer of the role between 2 nodes due to a
node becoming inactive.

Please review and maybe push...

peace & happiness,
martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ctdb.patch
Type: text/x-patch
Size: 17326 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20180914/a41e649f/ctdb.bin>