CTDB: Split brain and banning

Wed Oct 31 15:37:14 UTC 2018

On Wed, 31 Oct 2018 10:39:41 +0100, Michel Buijsman via samba-technical wrote:

> > As of ce289e89e5c469cf2c5626dc7f2666b945dba3bd, which is carried in
> > Samba 4.9.1 as a fix for bso#13540, the recovery master's reclock should
> > timeout after 10 seconds, allowing for one of the remaining nodes to
> > successfully takeover. How long after recovery master outage do you see
> > the ban occur? Full logs of this would be helpful.  
> 
> I've run a test with RecoveryBanPeriod=10 to keep the ban time somewhat
> manageable. It takes a few seconds for the whole cluster to get banned,
> certainly less than 10. I've attached the relevant logs from two nodes 
> after I'd killed the third. Lock contention, looks like.

It appears that ctdbd doesn't gracefully handle cases where the recovery
master goes down holding the reclock and standby nodes can't immediately
obtain the reclock following election. Your reclock helper lock_duration
setting of "30" means that the standby nodes may need to wait up to 30
seconds before obtaining the recovery lock.
If you specify a lock_duration of "5" and set RecoveryBanPeriod=5, does
your cluster return to OK ~5 seconds after master outage?

@Amitay/Martin: should I change the recovery lock helper to block while
retrying multiple times to obtain the recovery lock? Such a change
should avoid the immediate ban that occurs when we report contention.
I'm curious to hear how other clustered FSes / lock helpers handle
releasing the recovery lock once the holder dies - does GPFS do this
immediately?

Cheers, David