[PATCH] ctdb-recovery: Update timeout and number of retries during recovery
Martin Schwenke
martin at meltin.net
Fri Jun 3 06:07:46 UTC 2016
On Fri, 3 Jun 2016 15:25:40 +1000, Amitay Isaacs <amitay at gmail.com>
wrote:
> The timeout RecoverTimeout (default 120) is used for control messages
> sent during the recovery. If any of the nodes does not respond to any
> of the recovery control messages for RecoverTimeout seconds, then it
> will cause a failure of recovery of a database. Recovery helper will
> retry the recovery for a database 5 times.
>
> In the worst case, if a database could not be recovered within 5 attempts,
> a total of 600 seconds would have passed. During this time period other
> timeouts will be triggered causing unnecessary failures as follows:
>
> 1. During the recovery, even though recoverd is processing events,
> it does not send a ping message to ctdb daemon. If a ping message is
> not received for RecdPingTimeout (default 60) seconds, then ctdb will
> count it as unresponsive recovery daemon. If the recovery daemon
> fails for RecdFailCount (default 10) times, then ctdb daemon will
> restart recovery daemon. So after 600 seconds, ctdb daemon will
> restart recovery daemon.
>
> 2. If ctdb daemon stays in recovery for RecoveryDropAllIPs (default 120),
> then it will drop all the public addresses. This will cause all
> SMB client to be disconnected unnecessarily. The released public
> addresses will not be taken over till the recovery is complete.
>
> To avoid dropping of IPs and restarting recovery daemon during a delayed
> recovery, adjust RecoverTimeout to 30 seconds and limit number of
> retries for recovering a database to 3. If we don't hear from a node
> for more than 25 seconds, then the node is considered disconnected.
> So 30 seconds is sufficient timeout for controls during recovery.
>
> Please review and push.
Reviewed-by: Martin Schwenke <martin at meltin.net>
Let's see what else there is to push... :-)
peace & happiness,
martin
More information about the samba-technical
mailing list