Having problems understanding CTDB's election process

Martin Schwenke martin at meltin.net
Mon Nov 24 16:11:00 MST 2014


On Mon, 24 Nov 2014 14:50:12 -0800, Richard Sharpe
<realrichardsharpe at gmail.com> wrote:

> On Mon, Nov 24, 2014 at 1:45 PM, Richard Sharpe
> <realrichardsharpe at gmail.com> wrote:

> > I am trying to work my way through the CTDB code to figure out how
> > elections work.
> >
> > When the first ctdb node comes up, it will start the recoverd (which
> > always starts with recmaster = -1) which will call force_election.
> > This then sets the current node to the recovery master (via
> > set_recovery_mode and then sends an election request. Since there are
> > no other nodes at this stage, no one else will challenge us for the
> > position of recovery master.
> >
> > However, what I can't tell from the briar patch that is CTDB, is how
> > do_recovery gets called or if it needs to be called when the first
> > CTDB node starts.
> >
> > Can anyone enlighten me?
> 
> OK, I think I have found a way through the twisty little passageways
> in ctdb_start_daemon:
> 
>         /* force initial recovery for election */
>         ctdb->recovery_mode = CTDB_RECOVERY_ACTIVE;
> 
> It is starting to make sense now.

force_election() also sets recovery mode to active.  So every time an
election is called there will be a recovery.

However, we could avoid many elections when new nodes join, since
they are unlikely to become the master (due to the election criteria).
For this reason we think we should make it more RAFT-like: have the
master node regularly broadcast that it is the master, and when a new
node comes up it would wait a few seconds to see if it receives a
broadcast from the master, so it could avoid calling an election.

peace & happiness,
martin


More information about the samba-technical mailing list