Is it possible to use quorum for CTDB to prevent split-brain and removing lockfile in the cluster file system

XW Huang xwhuang123 at gmail.com
Thu May 24 09:33:12 MDT 2012


Hello metze,

I start 4 nodes at the same time 。There must be more than 3 nodes with flag
unhealthy asap and the cluster works well if it fulfills the condition。

Thank you。
Az
2012-5-24 下午10:17 於 "Stefan (metze) Metzmacher" <metze at samba.org> 寫道:

> Hi,
>
> > We know that CTDB uses lockfile in the cluster file system to prevent
> > split-brain.
> > It is a really good design when all nodes in the cluster can mount the
> > cluster file system (e.g. GPFS/GFS/GlusterFS) and CTDB can work happily
> in
> > this assumption.
> > However, when split-brain happens, the disconnected private network
> > violates this assumption usually.
> > For example, we have four nodes (A, B, C, D) in the cluster and GlusterFS
> > is the beckend.
> > GlusterFS and CTDB on all nodes communicate to each other via private
> > network and CTDB manages the public network.
> > If node A is disconnected in the private network, there will be group (A)
> > and group (B,C,D) in our cluster.
> > The election of recovery master will be triggered after the disconnected
> > determination of CTDB, i.e. the CTDB elects a new recovery master for
> each
> > group after 26 (KeepaliveInterval*KeepaliveLimits+1 by default) seconds.
> > Then node A will be the recovery master of group (A) and some node (e.g.
> B)
> > will be the recovery master of group (B,C,D).
> > Now, A and B will try to lock the lockfile but GlusterFS also
> communicates
> > to each other via private network.
> > A big problem arises since the lockfile can be locked or not depends on
> the
> > lock implementation and disconnected determination of GlusterFS (or other
> > cluster file system). In my knowledge, GlusterFS will determine some node
> > is disconnected after 42 seconds and release its lock. In this
> > configuration, node A and B will ban themselves and the newly elected
> > recovery master will ban itslef. It's a really bad thing and we can not
> > treat the cluster file system as a blackbox using the lockfile design.
> >
> > Hence, I have an idea about the opportunity to build CTDB with
> split-brain
> > prevention without lockfile.
> > Using quorum concepts to ban a node might be an option and I do a little
> > modification of the CTDB source code.
> > The modification checks whether there are more than (nodemap->num)/2
> > connected nodes in main_loop of server/ctdb_recoverd.c.
> > If not, ban the node itslef and logs an error "Node %u in the group
> without
> > quorum".
> >
> > In server/ctdb_recoverd.c:
> > static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd
> *rec,
> > TALLOC_CTX *mem_ctx)
> > ...
> >         /* count how many active nodes there are */
> >         rec->num_active    = 0;
> >         rec->num_connected = 0;
> >         for (i=0; i<nodemap->num; i++) {
> >                 if (!(nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE)) {
> >                         rec->num_active++;
> >                 }
> >                 if (!(nodemap->nodes[i].flags &
> NODE_FLAGS_DISCONNECTED)) {
> >                         rec->num_connected++;
> >                 }
> >         }
> >
> > +       if (rec->num_connected < ((nodemap->num)/2+1)){
> > +               DEBUG(DEBUG_ERR, ("Node %u in the group without
> quorum\n",
> > pnn));
> > +               ctdb_ban_node(rec, pnn,
> ctdb->tunable.recovery_ban_period);
> > +       }
> >
> > This modification seems to provide a split-brain prevention without
> > lockfile in my tests (more than 3 nodes).
> > Does this modification cause any side-effect or is this a stupid design?
> > Please kindly answer me and I appreciate to receive new inputs from smart
> > people like you guys.
>
> How would you start 4 node cluster?
>
> metze
>
>


More information about the samba-technical mailing list