CTDB: Split brain and banning

Michel Buijsman michelb at bit.nl
Tue Oct 30 14:10:18 UTC 2018


Hi list, 

I'm building a 3 node cluster of storage gateways using CTDB to connect 
various NFS and ISCSI clients to CEPH storage. I'm using a rados object as 
reclock using ctdb_mutex_ceph_rados_helper.

I'm having two problems:

1. Node banning: Unless I disable bans, the whole cluster tends to ban 
   itself when something goes wrong. As in: Node #1 (recovery master) dies, 
   then nodes #2 and #3 will both try to get the reclock, fail, and ban 
   themselves.
   
   I've "fixed" this for now with EnableBans=0.

2. Split brain: If the current recovery master drops off the network for 
   whatever reason but keeps running, it will ignore the fact that it can't 
   get the reclock: "Time out getting recovery lock, allowing recmode set 
   anyway". It will remain at status "OK" and start to claim every virtual
   IP in the cluster.

The split brain is obviously a problem as soon as the node gets back online:
Having IPs up on multiple nodes, having that node try to (re)claim resources 
that have timed out and failed over to other nodes, etc.

That node doesn't seem to recover either after getting back on the network:
It still thinks it's the recovery master and will keep trying for a reclock,
getting lock contention, without resetting itself.

I ran into this using CTDB 4.7.6 on Ubuntu 18.04 Bionic. Since upgraded to 
4.9.1, which still shows the same behaviour. Other than the event handlers
this is a fairly standard CTDB config, I've just configured the reclock to 
use the ctdb_mutex_ceph_rados_helper and played with a few tunables:

    IPAllocAlgorithm=1
    NoIPFailback=1
    KeepaliveInterval=1
    KeepaliveLimit=5
    MonitorInterval=5
    MonitorTimeoutCount=3
    RecoveryDropAllIPs=30
    EnableBans=0

Grepping the source, ignoring the reclock when it times out seems to be a 
conscious decision. This strikes me as odd since it directly leads to split 
brain in this case. I would expect it to fail hard on not getting a lock. 
Would it be possible to make this behaviour configurable with a tunable?

Or am I doing something wrong? :)

-- 
Michel Buijsman
BIT BV | Unix beheer | michelb at bit.nl | 08B90948



More information about the samba-technical mailing list