[PATCH] Restarting cleanupd when ctdb-messaging is down

Tue Jul 26 12:06:18 UTC 2016

On Fri, Jul 22, 2016 at 10:00:25PM +0200, Ralph Böhme wrote:
> On Mon, Jul 18, 2016 at 06:20:49PM +0200, Ralph Böhme wrote:
> > As discussed off-list, this will also need some sort of protection
> > against cleanup messages lost in-flight, like an ACK message or
> > possibly use a local tbd for posting cleanup events. Looking into
> > it...
> 
> updated patchset attached.
> 
> Summary:
> 
> o Adds restart retry semantics to cleanupd and notifyd (while I was at
>   it)
> 
> o cleanupd uses a tdb for passing cleanup events
> 
> Passes a private autobuild and smoke testing with:
> 
> # ctdb ban 120
> # kill $(ps -e -o pid,comm | awk '/ cleanupd$| smbd-notifyd$/ {print $1}')
> 
> ...wait 120 seconds, watch failing restarts. Finally, after 120
> seconds when ctdb messaging is up again, both cleanupd and notifyd are
> alive again.
> 
> Please review&push if ok. Thanks!

pushed with a minor tweak suggested by Volker offlist.

Cheerio!
-slow