[PATCH]: Inconsistent recmaster during election.
Kenny Dinh
kdinh at peaxy.net
Thu Dec 31 01:34:22 UTC 2015
The timing has to be right in order to reproduce the issue. I'm using a
script that restarts the RECMASTER, then immediately restarts the other
node. With the test script, I can consistently reproduce the issue.
On Wed, Dec 30, 2015 at 5:07 PM, Martin Schwenke <martin at meltin.net> wrote:
> On Tue, 29 Dec 2015 10:39:26 -0800, Kenny Dinh <kdinh at peaxy.net> wrote:
>
> > [PATCH] updated.
> >
> > I have a more targeted fix in the new version of my patch. Please
> > disregard my previous patch. In the new patch, we should let the
> recovery
> > master win the election if the recmaster sent the election message and we
> > agreed that it is the recovery master.
> >
> > Please review and push if it looks good.
>
> Just a note to say that this isn't being ignored... :-)
>
> I don't think I've seen this particular problem before but I've
> probably seen similar ones. I assume that this is an inconsistency
> between the main daemon and recovery daemon about which node is the
> recovery master. I need to get some clear time to take a closer look at
> your logs to understand this correctly. That might not happen until next
> week.
>
> Can you recreate this problem? Every time?
>
> rec->ctdb->recovery_master shouldn't really be used in the recovery
> daemon. I know that it is already used. :-( However, it is only set
> in the monitor handler, so there are corner cases where it will be
> incorrect (e.g. election called due to capability change).
>
> I think rec->recmaster is a better choice but in current versions it
> might suffer from similar problems. However, I'm still not sure if
> your patch is the best way of approaching the problem... since I don't
> yet understand the problem. ;-)
>
> In Samba master I have made quite a few changes to the handling of the
> recovery master. The recovery daemon now tracks election results
> internally and remembers who is master. It only ever retrieves the
> recovery master from remote nodes for consistency checking. This
> should eliminate a whole class of bugs.
>
> The longer term plan is that elections, tracking of the master node,
> cluster membership and keep-alives will all be done in a cluster
> management daemon
>
> More once I've had time to understand the logs...
>
> peace & happiness,
> martin
>
More information about the samba-technical
mailing list