CTDB: Split brain and banning

David Disseldorp ddiss at suse.de
Mon Nov 5 10:38:18 UTC 2018

On Thu, 1 Nov 2018 12:07:36 +1100, Martin Schwenke via samba-technical wrote:

> I wonder if we need a configurable number of timeouts during which the
> lock is retried... and then finally banned.  This relates to David's
> question about whether the helper should block and retry internally -
> that seems like a better solution.  However, then we hit the fixed 5
> second time-out that is allowed for taking the recovery lock.  Perhaps
> that needs to be configurable.  Then David's suggestion could work.

I'd prefer to fix this in the caller (ctdb), rather than reclock
helpers. I think it's reasonable to expect that reclock helpers will
return "contended" for an (arbitrarily) short period following failure
of the reclock holder.

Cheers, David

