Hello list,

We know that CTDB uses lockfile in the cluster file system to prevent
It is a really good design when all nodes in the cluster can mount the
cluster file system (e.g. GPFS/GFS/GlusterFS) and CTDB can work happily in
this assumption.
However, when split-brain happens, the disconnected private network
violates this assumption usually.
For example, we have four nodes (A, B, C, D) in the cluster and GlusterFS
is the beckend.
GlusterFS and CTDB on all nodes communicate to each other via private
network and CTDB manages the public network.
If node A is disconnected in the private network, there will be group (A)
and group (B,C,D) in our cluster.
The election of recovery master will be triggered after the disconnected
determination of CTDB, i.e. the CTDB elects a new recovery master for each
group after 26 (KeepaliveInterval*KeepaliveLimits+1 by default) seconds.
Then node A will be the recovery master of group (A) and some node (e.g. B)
will be the recovery master of group (B,C,D).
Now, A and B will try to lock the lockfile but GlusterFS also communicates
to each other via private network.
A big problem arises since the lockfile can be locked or not depends on the
lock implementation and disconnected determination of GlusterFS (or other
cluster file system). In my knowledge, GlusterFS will determine some node
is disconnected after 42 seconds and release its lock. In this
configuration, node A and B will ban themselves and the newly elected
recovery master will ban itslef. It's a really bad thing and we can not
treat the cluster file system as a blackbox using the lockfile design.

Hence, I have an idea about the opportunity to build CTDB with split-brain
prevention without lockfile.
Using quorum concepts to ban a node might be an option and I do a little
modification of the CTDB source code.
The modification checks whether there are more than (nodemap->num)/2
connected nodes in main_loop of server/ctdb_recoverd.c.
If not, ban the node itslef and logs an error "Node %u in the group without

In server/ctdb_recoverd.c:
static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec,
TALLOC_CTX *mem_ctx)
        /* count how many active nodes there are */
        rec->num_active    = 0;
        rec->num_connected = 0;
        for (i=0; i<nodemap->num; i++) {
                if (!(nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE)) {
                if (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED)) {

+       if (rec->num_connected < ((nodemap->num)/2+1)){
+               DEBUG(DEBUG_ERR, ("Node %u in the group without quorum\n",
+               ctdb_ban_node(rec, pnn, ctdb->tunable.recovery_ban_period);
+       }

This modification seems to provide a split-brain prevention without
lockfile in my tests(more than 3 nodes).
Does this modification cause any side-effect or is that a stupid design?
Please kindly answer me and I appreciate to receive new inputs from smart
people like you guys.


