[PATCH]: Inconsistent recmaster during election.

Martin Schwenke martin at meltin.net
Wed Jan 6 03:42:48 UTC 2016


Hi Kenny,

On Tue, 5 Jan 2016 12:35:15 -0800, Kenny Dinh <kdinh at peaxy.net> wrote:

> Following up on my previous reply.  Please discard the previous hack of
> replying -1 for recmaster requests.
> 
> The attached patch set the recovery mode to CTDB_RECOVERY_NORMAL during
> shutdown to prevent "smbcontrol winbindd ip-dropped" from hanging the
> shutdown process.  I think this is a better fix.
> 
> Can you please review.

It looks like you've found the problem.  Thanks!  That is, it looks
like:

 smbcontrol winbindd ip-dropped <ip>

hangs until the node is no longer in recovery mode, so shutdown is
delayed, which causes problems.

However, the solution is trickier than just setting recovery mode back
to normal.  If the recovery master notices this node or any node in with
recovery mode active, then it will start a recovery and set recovery
mode active on all nodes, include this one.  So you end up with the
following potential sequence:

1. Set recovery mode active
2. Recovery master notices recovery mode is active
3. Set recovery mode normal
4. Recovery master sets recovery mode active
5. <same problem>  :-(

So, while your patch probably narrows the window, you still have a
potential race between the "releaseip" event and recovery being active.

It would interesting to know what version of Samba you're running.
I've compared the behaviour of the above smbcontrol command between
Samba 4.0/CTDB 2.5.4 and Samba/CTDB master.  I think I see the
smbcontrol command hanging in the old version when recovery is active
but not in master.

I've CC:ed Volker, Michael and Metze to see if they know whether there
has been a change in the way the above smbcontrol is handled.  I'm not
sure what might have changed, if anything.

Thanks...

peace & happiness,
martin



More information about the samba-technical mailing list