[SCM] Samba Shared Repository - branch master updated

Martin Schwenke martins at samba.org
Mon Jun 6 06:50:03 UTC 2016


The branch, master has been updated
       via  93dcca2 ctdb-recovery: Update timeout and number of retries during recovery
      from  82a1094 samba_spnupdate: do not interpret failure count as unix error code

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 93dcca2a5f7af9698c9ba1024dbce1d1a66d4efb
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Thu Jun 2 18:27:29 2016 +1000

    ctdb-recovery: Update timeout and number of retries during recovery
    
    The timeout RecoverTimeout (default 120) is used for control messages
    sent during the recovery.  If any of the nodes does not respond to any
    of the recovery control messages for RecoverTimeout seconds, then it
    will cause a failure of recovery of a database.  Recovery helper will
    retry the recovery for a database 5 times.
    
    In the worst case, if a database could not be recovered within 5 attempts,
    a total of 600 seconds would have passed.  During this time period other
    timeouts will be triggered causing unnecessary failures as follows:
    
    1. During the recovery, even though recoverd is processing events,
       it does not send a ping message to ctdb daemon.  If a ping message is
       not received for RecdPingTimeout (default 60) seconds, then ctdb will
       count it as unresponsive recovery daemon.  If the recovery daemon
       fails for RecdFailCount (default 10) times, then ctdb daemon will
       restart recovery daemon.  So after 600 seconds, ctdb daemon will
       restart recovery daemon.
    
    2. If ctdb daemon stays in recovery for RecoveryDropAllIPs (default 120),
       then it will drop all the public addresses.  This will cause all
       SMB client to be disconnected unnecessarily.  The released public
       addresses will not be taken over till the recovery is complete.
    
    To avoid dropping of IPs and restarting recovery daemon during a delayed
    recovery, adjust RecoverTimeout to 30 seconds and limit number of
    retries for recovering a database to 3.  If we don't hear from a node
    for more than 25 seconds, then the node is considered disconnected.
    So 30 seconds is sufficient timeout for controls during recovery.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    Reviewed-by: Martin Schwenke <martin at meltin.net>
    
    Autobuild-User(master): Martin Schwenke <martins at samba.org>
    Autobuild-Date(master): Mon Jun  6 08:49:15 CEST 2016 on sn-devel-144

-----------------------------------------------------------------------

Summary of changes:
 ctdb/server/ctdb_recovery_helper.c | 4 ++--
 ctdb/server/ctdb_tunables.c        | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)


Changeset truncated at 500 lines:

diff --git a/ctdb/server/ctdb_recovery_helper.c b/ctdb/server/ctdb_recovery_helper.c
index 0720d0e..d54f32d 100644
--- a/ctdb/server/ctdb_recovery_helper.c
+++ b/ctdb/server/ctdb_recovery_helper.c
@@ -34,9 +34,9 @@
 #include "protocol/protocol_api.h"
 #include "client/client.h"
 
-static int recover_timeout = 120;
+static int recover_timeout = 30;
 
-#define NUM_RETRIES	5
+#define NUM_RETRIES	3
 
 #define TIMEOUT()	timeval_current_ofs(recover_timeout, 0)
 
diff --git a/ctdb/server/ctdb_tunables.c b/ctdb/server/ctdb_tunables.c
index 83b57f7..9c1e4a9 100644
--- a/ctdb/server/ctdb_tunables.c
+++ b/ctdb/server/ctdb_tunables.c
@@ -41,7 +41,7 @@ static const struct {
 	{ "TraverseTimeout",     20, offsetof(struct ctdb_tunable_list, traverse_timeout), false },
 	{ "KeepaliveInterval",    5,  offsetof(struct ctdb_tunable_list, keepalive_interval), false },
 	{ "KeepaliveLimit",       5,  offsetof(struct ctdb_tunable_list, keepalive_limit), false },
-	{ "RecoverTimeout",     120,  offsetof(struct ctdb_tunable_list, recover_timeout), false },
+	{ "RecoverTimeout",      30,  offsetof(struct ctdb_tunable_list, recover_timeout), false },
 	{ "RecoverInterval",      1,  offsetof(struct ctdb_tunable_list, recover_interval), false },
 	{ "ElectionTimeout",      3,  offsetof(struct ctdb_tunable_list, election_timeout), false },
 	{ "TakeoverTimeout",      9,  offsetof(struct ctdb_tunable_list, takeover_timeout), false },


-- 
Samba Shared Repository



More information about the samba-cvs mailing list