CTDB: Split brain and banning
Michel Buijsman
michelb at bit.nl
Tue Oct 30 14:10:18 UTC 2018
Hi list,
I'm building a 3 node cluster of storage gateways using CTDB to connect
various NFS and ISCSI clients to CEPH storage. I'm using a rados object as
reclock using ctdb_mutex_ceph_rados_helper.
I'm having two problems:
1. Node banning: Unless I disable bans, the whole cluster tends to ban
itself when something goes wrong. As in: Node #1 (recovery master) dies,
then nodes #2 and #3 will both try to get the reclock, fail, and ban
themselves.
I've "fixed" this for now with EnableBans=0.
2. Split brain: If the current recovery master drops off the network for
whatever reason but keeps running, it will ignore the fact that it can't
get the reclock: "Time out getting recovery lock, allowing recmode set
anyway". It will remain at status "OK" and start to claim every virtual
IP in the cluster.
The split brain is obviously a problem as soon as the node gets back online:
Having IPs up on multiple nodes, having that node try to (re)claim resources
that have timed out and failed over to other nodes, etc.
That node doesn't seem to recover either after getting back on the network:
It still thinks it's the recovery master and will keep trying for a reclock,
getting lock contention, without resetting itself.
I ran into this using CTDB 4.7.6 on Ubuntu 18.04 Bionic. Since upgraded to
4.9.1, which still shows the same behaviour. Other than the event handlers
this is a fairly standard CTDB config, I've just configured the reclock to
use the ctdb_mutex_ceph_rados_helper and played with a few tunables:
IPAllocAlgorithm=1
NoIPFailback=1
KeepaliveInterval=1
KeepaliveLimit=5
MonitorInterval=5
MonitorTimeoutCount=3
RecoveryDropAllIPs=30
EnableBans=0
Grepping the source, ignoring the reclock when it times out seems to be a
conscious decision. This strikes me as odd since it directly leads to split
brain in this case. I would expect it to fail hard on not getting a lock.
Would it be possible to make this behaviour configurable with a tunable?
Or am I doing something wrong? :)
--
Michel Buijsman
BIT BV | Unix beheer | michelb at bit.nl | 08B90948
More information about the samba-technical
mailing list