Is it possible to use quorum for CTDB to prevent split-brain and removing lockfile in the cluster file system

Stefan (metze) Metzmacher metze at samba.org
Thu May 24 08:16:58 MDT 2012


Hi,

> We know that CTDB uses lockfile in the cluster file system to prevent
> split-brain.
> It is a really good design when all nodes in the cluster can mount the
> cluster file system (e.g. GPFS/GFS/GlusterFS) and CTDB can work happily in
> this assumption.
> However, when split-brain happens, the disconnected private network
> violates this assumption usually.
> For example, we have four nodes (A, B, C, D) in the cluster and GlusterFS
> is the beckend.
> GlusterFS and CTDB on all nodes communicate to each other via private
> network and CTDB manages the public network.
> If node A is disconnected in the private network, there will be group (A)
> and group (B,C,D) in our cluster.
> The election of recovery master will be triggered after the disconnected
> determination of CTDB, i.e. the CTDB elects a new recovery master for each
> group after 26 (KeepaliveInterval*KeepaliveLimits+1 by default) seconds.
> Then node A will be the recovery master of group (A) and some node (e.g. B)
> will be the recovery master of group (B,C,D).
> Now, A and B will try to lock the lockfile but GlusterFS also communicates
> to each other via private network.
> A big problem arises since the lockfile can be locked or not depends on the
> lock implementation and disconnected determination of GlusterFS (or other
> cluster file system). In my knowledge, GlusterFS will determine some node
> is disconnected after 42 seconds and release its lock. In this
> configuration, node A and B will ban themselves and the newly elected
> recovery master will ban itslef. It's a really bad thing and we can not
> treat the cluster file system as a blackbox using the lockfile design.
> 
> Hence, I have an idea about the opportunity to build CTDB with split-brain
> prevention without lockfile.
> Using quorum concepts to ban a node might be an option and I do a little
> modification of the CTDB source code.
> The modification checks whether there are more than (nodemap->num)/2
> connected nodes in main_loop of server/ctdb_recoverd.c.
> If not, ban the node itslef and logs an error "Node %u in the group without
> quorum".
> 
> In server/ctdb_recoverd.c:
> static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec,
> TALLOC_CTX *mem_ctx)
> ...
>         /* count how many active nodes there are */
>         rec->num_active    = 0;
>         rec->num_connected = 0;
>         for (i=0; i<nodemap->num; i++) {
>                 if (!(nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE)) {
>                         rec->num_active++;
>                 }
>                 if (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED)) {
>                         rec->num_connected++;
>                 }
>         }
> 
> +       if (rec->num_connected < ((nodemap->num)/2+1)){
> +               DEBUG(DEBUG_ERR, ("Node %u in the group without quorum\n",
> pnn));
> +               ctdb_ban_node(rec, pnn, ctdb->tunable.recovery_ban_period);
> +       }
> 
> This modification seems to provide a split-brain prevention without
> lockfile in my tests (more than 3 nodes).
> Does this modification cause any side-effect or is this a stupid design?
> Please kindly answer me and I appreciate to receive new inputs from smart
> people like you guys.

How would you start 4 node cluster?

metze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20120524/2e2610fc/attachment.pgp>


More information about the samba-technical mailing list