[PATCH] ctdb-recovery: Update timeout and number of retries during recovery
Martin Schwenke
martin at meltin.net
Mon Jun 6 04:12:55 UTC 2016
On Fri, 3 Jun 2016 16:07:46 +1000, Martin Schwenke <martin at meltin.net>
wrote:
> On Fri, 3 Jun 2016 15:25:40 +1000, Amitay Isaacs <amitay at gmail.com>
> wrote:
>
> > The timeout RecoverTimeout (default 120) is used for control messages
> > sent during the recovery. If any of the nodes does not respond to any
> > of the recovery control messages for RecoverTimeout seconds, then it
> > will cause a failure of recovery of a database. Recovery helper will
> > retry the recovery for a database 5 times.
> >
> > In the worst case, if a database could not be recovered within 5 attempts,
> > a total of 600 seconds would have passed. During this time period other
> > timeouts will be triggered causing unnecessary failures as follows:
> >
> > 1. During the recovery, even though recoverd is processing events,
> > it does not send a ping message to ctdb daemon. If a ping message is
> > not received for RecdPingTimeout (default 60) seconds, then ctdb will
> > count it as unresponsive recovery daemon. If the recovery daemon
> > fails for RecdFailCount (default 10) times, then ctdb daemon will
> > restart recovery daemon. So after 600 seconds, ctdb daemon will
> > restart recovery daemon.
> >
> > 2. If ctdb daemon stays in recovery for RecoveryDropAllIPs (default 120),
> > then it will drop all the public addresses. This will cause all
> > SMB client to be disconnected unnecessarily. The released public
> > addresses will not be taken over till the recovery is complete.
> >
> > To avoid dropping of IPs and restarting recovery daemon during a delayed
> > recovery, adjust RecoverTimeout to 30 seconds and limit number of
> > retries for recovering a database to 3. If we don't hear from a node
> > for more than 25 seconds, then the node is considered disconnected.
> > So 30 seconds is sufficient timeout for controls during recovery.
> >
> > Please review and push.
>
> Reviewed-by: Martin Schwenke <martin at meltin.net>
>
> Let's see what else there is to push... :-)
... and pushed...
peace & happiness,
martin
More information about the samba-technical
mailing list