[PATCH]: Inconsistent recmaster during election.

Kenny Dinh kdinh at peaxy.net
Thu Dec 31 01:34:22 UTC 2015

The timing has to be right in order to reproduce the issue.  I'm using a
script that restarts the RECMASTER, then immediately restarts the other
node.  With the test script, I can consistently reproduce the issue.

On Wed, Dec 30, 2015 at 5:07 PM, Martin Schwenke <martin at meltin.net> wrote:

> On Tue, 29 Dec 2015 10:39:26 -0800, Kenny Dinh <kdinh at peaxy.net> wrote:
> > [PATCH] updated.
> >
> > I have a more targeted fix in the new version of my patch.  Please
> > disregard my previous patch.  In the new patch, we should let the
> recovery
> > master win the election if the recmaster sent the election message and we
> > agreed that it is the recovery master.
> >
> > Please review and push if it looks good.
> Just a note to say that this isn't being ignored...  :-)
> I don't think I've seen this particular problem before but I've
> probably seen similar ones. I assume that this is an inconsistency
> between the main daemon and recovery daemon about which node is the
> recovery master.  I need to get some clear time to take a closer look at
> your logs to understand this correctly. That might not happen until next
> week.
> Can you recreate this problem?  Every time?
> rec->ctdb->recovery_master shouldn't really be used in the recovery
> daemon.  I know that it is already used.  :-(  However, it is only set
> in the monitor handler, so there are corner cases where it will be
> incorrect (e.g. election called due to capability change).
> I think rec->recmaster is a better choice but in current versions it
> might suffer from similar problems.  However, I'm still not sure if
> your patch is the best way of approaching the problem... since I don't
> yet understand the problem.  ;-)
> In Samba master I have made quite a few changes to the handling of the
> recovery master.  The recovery daemon now tracks election results
> internally and remembers who is master.  It only ever retrieves the
> recovery master from remote nodes for consistency checking.  This
> should eliminate a whole class of bugs.
> The longer term plan is that elections, tracking of the master node,
> cluster membership and keep-alives will all be done in a cluster
> management daemon
> More once I've had time to understand the logs...
> peace & happiness,
> martin

More information about the samba-technical mailing list