[PATCH] ctdb-recovery: Update timeout and number of retries during recovery

Martin Schwenke martin at meltin.net
Fri Jun 3 06:07:46 UTC 2016


On Fri, 3 Jun 2016 15:25:40 +1000, Amitay Isaacs <amitay at gmail.com>
wrote:

> The timeout RecoverTimeout (default 120) is used for control messages
> sent during the recovery.  If any of the nodes does not respond to any
> of the recovery control messages for RecoverTimeout seconds, then it
> will cause a failure of recovery of a database.  Recovery helper will
> retry the recovery for a database 5 times.
> 
> In the worst case, if a database could not be recovered within 5 attempts,
> a total of 600 seconds would have passed.  During this time period other
> timeouts will be triggered causing unnecessary failures as follows:
> 
> 1. During the recovery, even though recoverd is processing events,
>    it does not send a ping message to ctdb daemon.  If a ping message is
>    not received for RecdPingTimeout (default 60) seconds, then ctdb will
>    count it as unresponsive recovery daemon.  If the recovery daemon
>    fails for RecdFailCount (default 10) times, then ctdb daemon will
>    restart recovery daemon.  So after 600 seconds, ctdb daemon will
>    restart recovery daemon.
> 
> 2. If ctdb daemon stays in recovery for RecoveryDropAllIPs (default 120),
>    then it will drop all the public addresses.  This will cause all
>    SMB client to be disconnected unnecessarily.  The released public
>    addresses will not be taken over till the recovery is complete.
> 
> To avoid dropping of IPs and restarting recovery daemon during a delayed
> recovery, adjust RecoverTimeout to 30 seconds and limit number of
> retries for recovering a database to 3.  If we don't hear from a node
> for more than 25 seconds, then the node is considered disconnected.
> So 30 seconds is sufficient timeout for controls during recovery.
> 
> Please review and push.

Reviewed-by: Martin Schwenke <martin at meltin.net>

Let's see what else there is to push...  :-)

peace & happiness,
martin



More information about the samba-technical mailing list