[SCM] CTDB repository - branch 1.2.40 updated - ctdb-1.2.64-33-gdc84c8e

Amitay Isaacs amitay at samba.org
Tue Jul 2 18:18:53 MDT 2013


The branch, 1.2.40 has been updated
       via  dc84c8ed12ed1bf136827b55128c2e74b38bdf55 (commit)
       via  20f730070d9dfbff5e29461a32dec8ee2a185a68 (commit)
       via  345f17d9bcf4b4f0f246b951d55458e1f774f5df (commit)
       via  c7156f49616600d1ad97262115728fbb42ef5574 (commit)
       via  d2f641181927f427b80705e28b88777e50a88433 (commit)
       via  1fbe90b28f11d18ab1a0e3e15eaa42c6de75683f (commit)
       via  8196c25749e9b12628768bdd96f9d85bd7166c07 (commit)
       via  7b98e4cae9b3e52256bdf5ec42aedabe236b7de5 (commit)
       via  d450c31e37fd4b38a6ac3245a85082769b78935f (commit)
       via  877923c8aee49eb3ef610d83727f41ae2f9d09ba (commit)
       via  350ba6907b8f7122c4df21879ffbb3b74e8df93c (commit)
       via  24a0bc600acbe3deea9549ce87567e9e0f779a89 (commit)
       via  42c8c05d23dcd22404d6c761171ffe210734150c (commit)
       via  c0626646d87fe477bbbd425ef94513f466b2e876 (commit)
       via  324d70e26bef94fab1c8c9b8c17cd0a7817866d9 (commit)
       via  1a03e2f0366800eeb15308932f6483f48e2547ea (commit)
       via  df6d7554ccb6c53bcff9bded85ed4882f335eaee (commit)
       via  1c64d205d92da6b5a9af98190755ebce5e1176f1 (commit)
       via  53ecceb8e0dba4149005bb4b042c0845470e859b (commit)
       via  c311947f93c5c01a3819c401566dfa9dc87855d0 (commit)
       via  ca276e0ceb0952ca2832829d8bfc44074915ffe5 (commit)
       via  d29d729ddb26fc5f8ce622a2e733c57baa3733a6 (commit)
       via  5c62e4313c505baba73beab9fcb097ea4e10d452 (commit)
       via  faf8a5fd78ca4854bf103bb4317758082f127684 (commit)
       via  2a2de0939e1a30a4eb8839dbd49d8d8c80609d9c (commit)
       via  c18dbafff80494277d2bb1c91f67cdf3c2425ad8 (commit)
       via  911534f39708848087911f2ab69dbcbc59c12295 (commit)
       via  023d4825375f49e8eeaa25ff54db4fc5eeea9ac8 (commit)
       via  c7665d48f1ac5baecfd64502f419215b1fb259d0 (commit)
       via  5f235169728576e008067d08eefa0661d4a6b520 (commit)
       via  a2bc855110b908ed72941bdfe176a79a5b5876a6 (commit)
       via  7a77f8661c4b7919c0b575fb79a22e62391e7654 (commit)
       via  c02afd52b6788dfe6b051f185465ff6854f7a845 (commit)
      from  4560186b514221bbde89ebc0124380007a22ed08 (commit)

http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=1.2.40


- Log -----------------------------------------------------------------
commit dc84c8ed12ed1bf136827b55128c2e74b38bdf55
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Tue Jul 2 17:19:05 2013 +1000

    New version 1.2.65
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>

commit 20f730070d9dfbff5e29461a32dec8ee2a185a68
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Tue Jul 2 12:40:37 2013 +1000

    ctdbd: Don't ban self if init or shutdown event fails
    
    There is no point in banning the node if init or shutdown event times
    out since it's going to quit anyway.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit ef1c4e99ca66e7a990bc557f34abb624c315e6ba)

commit 345f17d9bcf4b4f0f246b951d55458e1f774f5df
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Thu Jun 27 17:46:43 2013 +1000

    doc: The second half of monitoring is only for recovery master
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit fcd5e1f04c5fe6c98399429b8f0918b8779acba6)

commit c7156f49616600d1ad97262115728fbb42ef5574
Author: Michael Adam <obnox at samba.org>
Date:   Wed Jun 26 09:23:22 2013 +0200

    recoverd: when the recmaster is banned, use that information when forcing an election
    
    When we trigger an election because the recmaster considers itself inactive,
    update our local nodemap with the recmaster's flags before calling
    force_election(). This way, we don't send the inactive node freeze commands
    (e.g.) that may fail and then lead to ourselves getting banned.
    
    The theory is that this should help avoiding banning loops.
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    (cherry picked from commit 932360992b08a5483d90c0590218ba0fd756119e)

commit d2f641181927f427b80705e28b88777e50a88433
Author: Michael Adam <obnox at samba.org>
Date:   Wed Jun 26 07:11:51 2013 +0200

    recoverd: fix a comment typo
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    (cherry picked from commit 741944f118e98f178b860194eecb215180949d18)

commit 1fbe90b28f11d18ab1a0e3e15eaa42c6de75683f
Author: Michael Adam <obnox at samba.org>
Date:   Fri Jun 21 17:57:37 2013 +0200

    recoverd: fix a comment in main_loop
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    (cherry picked from commit ac06c46e4a80c635f6094b5ac6f0bf3e3a02db95)

commit 8196c25749e9b12628768bdd96f9d85bd7166c07
Author: Michael Adam <obnox at samba.org>
Date:   Fri Jun 21 14:06:22 2013 +0200

    recoverd: eliminate some trailing spaces from ctdb_election_win()
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    (cherry picked from commit df30c0a05ed908fc2a997c56ff5484736b23b70f)

commit 7b98e4cae9b3e52256bdf5ec42aedabe236b7de5
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jun 28 16:31:07 2013 +1000

    recoverd: Don't continue if the current node gets banned
    
    Can not continue with recovery or monitoring cluster.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 14399de1dd0bd8dabf1f48b1457e3ccb37589d8a)

commit d450c31e37fd4b38a6ac3245a85082769b78935f
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Fri Jun 28 14:31:02 2013 +1000

    recoverd: Refactor code to ban misbehaving nodes
    
    Since we have nodemap information, there is no need to hardcode the
    limit of 20.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    Pair-Programmed-With: Martin Schwenke <martin at meltin.net>
    (cherry picked from commit aea12dce83ef385e9fb3bc03ac7ace0874a0e3fe)

commit 877923c8aee49eb3ef610d83727f41ae2f9d09ba
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Thu Jun 27 16:01:16 2013 +1000

    recoverd: Move code to ban other nodes after we get local node flags
    
    If a node gets banned first, then it should not ban other nodes.
    
    This code was moved up in main_loop to avoid waiting for nodemap
    from other nodes (commit 83b0261f2cb453195b86f547d360400103a8b795).
    
    To prevent a banned node from banning other nodes, we need to first get
    nodemap information from local node, so trying to ban other nodes can
    fail if we are already banned.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit ae1693905036ecdbc4594fde1f12500faae4a554)

commit 350ba6907b8f7122c4df21879ffbb3b74e8df93c
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Thu Jun 27 15:44:27 2013 +1000

    recoverd: Delay the initial election if node is started in stopped state
    
    Since there is an early exit if a node is stopped or banned, we can wait till
    the node becomes active to start initial election.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 593a17678fbd3109e118154b034d43b852659518)

commit 24a0bc600acbe3deea9549ce87567e9e0f779a89
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Thu Jun 27 15:33:49 2013 +1000

    recoverd: Update capabilities only if the current node is active
    
    Since we do an early return if a node is stopped or banned, move update
    capabilities code below the early return and just before we check the
    capabilities of current recovery master.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 93bcb6617e1024f810533e12390a572f51703ca0)

commit 42c8c05d23dcd22404d6c761171ffe210734150c
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Thu Jun 27 15:46:04 2013 +1000

    recoverd: No need to check if node is recovery master when inactive
    
    If a node is stopped or banned, it will cause early return from the
    main_loop, so this check is redundent.  The election will called by an
    active node.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 815ddd3341b7e9db39e05a3a3fcd9a1420f053bc)
    
    Conflicts:
    	server/ctdb_recoverd.c

commit c0626646d87fe477bbbd425ef94513f466b2e876
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Thu Jun 27 15:39:15 2013 +1000

    recoverd: Always do an early exit from main_loop if node is stopped or banned
    
    A stopped or banned node cannot do anything useful.  So do not participate
    in any cluster activity and do not cause any unnecessary network traffic.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 2396981c4bcf30530aeb7f4395093cc202105b50)
    
    Conflicts:
    	server/ctdb_recoverd.c

commit 324d70e26bef94fab1c8c9b8c17cd0a7817866d9
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Fri Jun 28 14:10:47 2013 +1000

    recoverd: Do not set banning credits on a node if current node is inactive
    
    If the current node is banned or stopped, then it should not assign banning
    credits to other nodes since the current node will not have up-to-date flags
    of other nodes.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 38304f88e0c634e97d4687c25adef975f71537b8)

commit 1a03e2f0366800eeb15308932f6483f48e2547ea
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Jul 1 17:40:36 2013 +1000

    banning: Do not come out of ban if databases are not frozen
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit a60f228f8380f222f838eb619d2ab55f96f11ac2)

commit df6d7554ccb6c53bcff9bded85ed4882f335eaee
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Jun 24 14:33:32 2013 +1000

    banning: No need to check if banned pnn is for local node
    
    If the banned pnn is not the local node, the function returns early.
    So no need for additional check.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 297d93cecc3c0655e72ecac38508e113bdbeab9c)

commit 1c64d205d92da6b5a9af98190755ebce5e1176f1
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Fri Jun 28 14:04:18 2013 +1000

    banning: Make ctdb_local_node_got_banned() a void function
    
    When this function is called, we are already committed to banning
    and there is no point in failing this function.  In case, freezing of
    databases fails, it will be fixed from recovery daemon.
    (cherry picked from commit bb178338658b4ae32382a1f62f7c21cee1d4878f)

commit 53ecceb8e0dba4149005bb4b042c0845470e859b
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Fri Jun 28 14:02:44 2013 +1000

    recoverd: Also check if current node is in recovery when it is banned
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 6a9dbb8fb0f1f6e8c206189cdc2d33bb371ea2a8)

commit c311947f93c5c01a3819c401566dfa9dc87855d0
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Fri Jun 28 14:09:35 2013 +1000

    recoverd: Set node_flags information as soon as we get nodemap
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 8d622660a14c929e365d306147b378ea6ab92175)

commit ca276e0ceb0952ca2832829d8bfc44074915ffe5
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Wed Jun 26 16:02:23 2013 +1000

    recovered: Remove old comment as the code corresponding to that has gone away
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 34af2cdf686d5d77854cbaa7bbcd8f878e9171c7)

commit d29d729ddb26fc5f8ce622a2e733c57baa3733a6
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Jun 24 14:31:50 2013 +1000

    banning: Log ban state changes for other nodes at higher debug level
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit c6f8407648abb37f2ed781afa5171dad8c9f59e9)

commit 5c62e4313c505baba73beab9fcb097ea4e10d452
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Jul 1 16:28:04 2013 +1000

    freeze: Make ctdb_start_freeze() a void function
    
    If this function fails due to memory errors, there is no way to recover.
    The best course of action is to abort.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 46efe7a886f8c4c56f19536adc98a73c22db906a)
    
    Conflicts:
    	server/ctdb_freeze.c

commit faf8a5fd78ca4854bf103bb4317758082f127684
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Jul 1 16:21:00 2013 +1000

    freeze: If priority is invalid here, it's time to abort
    
    ctdb_start_freeze() is called from ctdb_control_freeze() which fixes the
    priority if it's 0 and return error if it's invalid.  Other callers of
    ctdb_start_freeze() are internal to CTDB.  So if priority is invalid in
    ctdb_start_freeze(), definitely something is seriously wrong.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 87716e8f504d659515d3dbcf93badbf106873bc8)
    
    Conflicts:
    	server/ctdb_freeze.c

commit 2a2de0939e1a30a4eb8839dbd49d8d8c80609d9c
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Jul 1 13:26:33 2013 +1000

    freeze: Log message from ctdb_start_freeze() and ctdb_control_freeze()
    
    This ensures that whenever databases are frozen either via sending
    control or by calling ctdb_start_freeze(), the action is logged.
    Since ctdb_control_freeze() calls ctdb_start_freeze(), move logging of
    message in early return condition if databases are already frozen.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 478e24bceda3fedfba54ccb48faa115df726b819)

commit c18dbafff80494277d2bb1c91f67cdf3c2425ad8
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Jun 24 14:18:58 2013 +1000

    recoverd: Print banning message only after verifying pnn
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 4be8dff3a4451192f838497b4747273685959bed)

commit 911534f39708848087911f2ab69dbcbc59c12295
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Wed Jun 26 15:22:46 2013 +1000

    recoverd: When updating flags on nodes, send updated flags and not old flags
    
    This was broken by commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa.
    Instead of a SRVID_SET_NODE_FLAGS message to recovery daemon, a control
    was sent to the local daemon which in turn informed the recovery daemon.
    And while doing this change old flags were sent via CONTROL_MODIFY_FLAGS.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 7eb2f89979360b6cc98ca9b17c48310277fa89fc)

commit 023d4825375f49e8eeaa25ff54db4fc5eeea9ac8
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jun 26 14:34:47 2013 +1000

    tools/ctdb: Add "force" option to "recover" command
    
    At the moment there is no easy way to force a recovery when attempting
    to reproduce certain classes of bugs.  This option is added without
    documentation because it is dangerous until the bugs are fixed!  :-)
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    (cherry picked from commit 4f87925a287f612a6ab3b5da1a387a31c7bea28f)

commit c7665d48f1ac5baecfd64502f419215b1fb259d0
Author: Michael Adam <obnox at samba.org>
Date:   Fri Mar 22 17:48:00 2013 +0100

    recoverd: remove bogus comment "qqq" from "add prototype new banning code"
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    (cherry picked from commit 9f01b8db72780acf2f88f1392bc0a796dd4c6176)

commit 5f235169728576e008067d08eefa0661d4a6b520
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Oct 11 15:59:00 2012 +1100

    recoverd: Clarify some misleading log messages
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    (cherry picked from commit 14589bf7c16ba017fe00d4e8bea8cc501546c60f)

commit a2bc855110b908ed72941bdfe176a79a5b5876a6
Author: Stefan Metzmacher <metze at samba.org>
Date:   Tue Jun 21 15:49:30 2011 +0200

    recoverd: try to become the recovery master if we have the capability, but the current master doesn't
    
    metze
    (cherry picked from commit 6ba8af28f8a8f79db65120a97d7157dcc5c7e083)
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    (cherry picked from commit ccd67cf7f26713e695000d89d9ce8cfa78bfe00f)

commit 7a77f8661c4b7919c0b575fb79a22e62391e7654
Author: Stefan Metzmacher <metze at samba.org>
Date:   Tue Aug 31 08:42:32 2010 +0200

    server/recoverd: do takeover_run after verifying the reclock file
    
    metze
    (cherry picked from commit 93df096773c89f21f77b3bcf9aa90bf28881b852)

commit c02afd52b6788dfe6b051f185465ff6854f7a845
Author: Stefan Metzmacher <metze at samba.org>
Date:   Tue Aug 31 09:28:34 2010 +0200

    server/banning: also release all ips if we're banning ourself
    
    metze
    (cherry picked from commit c386f2c62f06f1c60047b7d4b1ec7a9eec11873c)

-----------------------------------------------------------------------

Summary of changes:
 doc/recovery-process.txt   |    4 +-
 include/ctdb_private.h     |    3 +-
 packaging/RPM/ctdb.spec.in |    7 ++-
 server/ctdb_banning.c      |   49 ++++++++++-
 server/ctdb_freeze.c       |   30 ++-----
 server/ctdb_monitor.c      |   18 +----
 server/ctdb_recoverd.c     |  205 +++++++++++++++++++++++++------------------
 server/eventscript.c       |    6 +-
 tools/ctdb.c               |   11 ++-
 9 files changed, 198 insertions(+), 135 deletions(-)


Changeset truncated at 500 lines:

diff --git a/doc/recovery-process.txt b/doc/recovery-process.txt
index 7780d84..7cfc678 100644
--- a/doc/recovery-process.txt
+++ b/doc/recovery-process.txt
@@ -112,8 +112,8 @@ These tests are performed on all nodes in the cluster which is why it is optimiz
 as few network calls to other nodes as possible.
 Each node only performs 1 call to the recovery master in each loop and to no other nodes.
 
-NORMAL NODE CLUSTER MONITORING
-------------------------------
+RECOVERY MASTER CLUSTER MONITORING
+-----------------------------------
 The recovery master performs a much more extensive test. In addition to tests 1-9 above
 the recovery master also performs the following tests:
 
diff --git a/include/ctdb_private.h b/include/ctdb_private.h
index 7d67a10..87e9a60 100644
--- a/include/ctdb_private.h
+++ b/include/ctdb_private.h
@@ -1206,7 +1206,7 @@ int ctdb_ctrl_get_all_tunables(struct ctdb_context *ctdb,
 			       uint32_t destnode,
 			       struct ctdb_tunable *tunables);
 
-int ctdb_start_freeze(struct ctdb_context *ctdb, uint32_t priority);
+void ctdb_start_freeze(struct ctdb_context *ctdb, uint32_t priority);
 
 bool parse_ip_mask(const char *s, const char *iface, ctdb_sock_addr *addr, unsigned *mask);
 bool parse_ip_port(const char *s, ctdb_sock_addr *addr);
@@ -1361,6 +1361,7 @@ int ctdb_vacuum_init(struct ctdb_db_context *ctdb_db);
 int32_t ctdb_control_enable_script(struct ctdb_context *ctdb, TDB_DATA indata);
 int32_t ctdb_control_disable_script(struct ctdb_context *ctdb, TDB_DATA indata);
 
+void ctdb_local_node_got_banned(struct ctdb_context *ctdb);
 int32_t ctdb_control_set_ban_state(struct ctdb_context *ctdb, TDB_DATA indata);
 int32_t ctdb_control_get_ban_state(struct ctdb_context *ctdb, TDB_DATA *outdata);
 int32_t ctdb_control_set_db_priority(struct ctdb_context *ctdb, TDB_DATA indata);
diff --git a/packaging/RPM/ctdb.spec.in b/packaging/RPM/ctdb.spec.in
index 1a5ccfc..5084d2a 100644
--- a/packaging/RPM/ctdb.spec.in
+++ b/packaging/RPM/ctdb.spec.in
@@ -3,7 +3,7 @@ Name: ctdb
 Summary: Clustered TDB
 Vendor: Samba Team
 Packager: Samba Team <samba at samba.org>
-Version: 1.2.64
+Version: 1.2.65
 Release: 1GITHASH
 Epoch: 0
 License: GNU GPL version 3
@@ -155,6 +155,11 @@ development libraries for ctdb
 
 %changelog
 
+* Tue Jul 02 2013 : Version 1.2.65
+  - Fix the flags passed in modify flags control from recovery daemon
+  - Do early return from recoverd main loop if node is inactive
+  - Don't let inactive node apply banning credits on other nodes
+  - If recmaster node is inactive, don't include it in election
 * Thu Jun 20 2013 : Version 1.2.64
   - Add configuration variables to maintain configured number of NFS threads
   - Fix racy code in CTDB commandline tool for ipreallocate/sync
diff --git a/server/ctdb_banning.c b/server/ctdb_banning.c
index 3d5f216..5ac1e66 100644
--- a/server/ctdb_banning.c
+++ b/server/ctdb_banning.c
@@ -32,6 +32,21 @@ ctdb_ban_node_event(struct event_context *ev, struct timed_event *te,
 			       struct timeval t, void *private_data)
 {
 	struct ctdb_context *ctdb = talloc_get_type(private_data, struct ctdb_context);
+	bool freeze_failed = false;
+	int i;
+
+	/* Make sure we were able to freeze databases during banning */
+	for (i=1; i<=NUM_DB_PRIORITIES; i++) {
+		if (ctdb->freeze_mode[i] != CTDB_FREEZE_FROZEN) {
+			freeze_failed = true;
+			break;
+		}
+	}
+	if (freeze_failed) {
+		DEBUG(DEBUG_ERR, ("Banning timedout, but still unable to freeze databases\n"));
+		ctdb_ban_self(ctdb);
+		return;
+	}
 
 	DEBUG(DEBUG_ERR,("Banning timedout\n"));
 	ctdb->nodes[ctdb->pnn]->flags &= ~NODE_FLAGS_BANNED;
@@ -42,6 +57,27 @@ ctdb_ban_node_event(struct event_context *ev, struct timed_event *te,
 	}
 }
 
+void ctdb_local_node_got_banned(struct ctdb_context *ctdb)
+{
+	uint32_t i;
+
+	/* make sure we are frozen */
+	DEBUG(DEBUG_NOTICE,("This node has been banned - forcing freeze and recovery\n"));
+
+	/* Reset the generation id to 1 to make us ignore any
+	   REQ/REPLY CALL/DMASTER someone sends to us.
+	   We are now banned so we shouldnt service database calls
+	   anymore.
+	*/
+	ctdb->vnn_map->generation = INVALID_GENERATION;
+
+	for (i=1; i<=NUM_DB_PRIORITIES; i++) {
+		ctdb_start_freeze(ctdb, i);
+	}
+	ctdb_release_all_ips(ctdb);
+	ctdb->recovery_mode = CTDB_RECOVERY_ACTIVE;
+}
+
 int32_t ctdb_control_set_ban_state(struct ctdb_context *ctdb, TDB_DATA indata)
 {
 	struct ctdb_ban_time *bantime = (struct ctdb_ban_time *)indata.dptr;
@@ -54,12 +90,16 @@ int32_t ctdb_control_set_ban_state(struct ctdb_context *ctdb, TDB_DATA indata)
 			return -1;
 		}
 		if (bantime->time == 0) {
-			DEBUG(DEBUG_INFO,("unbanning node %d\n", bantime->pnn));
+			DEBUG(DEBUG_NOTICE,("unbanning node %d\n", bantime->pnn));
 			ctdb->nodes[bantime->pnn]->flags &= ~NODE_FLAGS_BANNED;
 		} else {
-			DEBUG(DEBUG_INFO,("banning node %d\n", bantime->pnn));
+			DEBUG(DEBUG_NOTICE,("banning node %d\n", bantime->pnn));
 			if (ctdb->tunable.enable_bans == 0) {
-				DEBUG(DEBUG_INFO,("Bans are disabled - ignoring ban of node %u\n", bantime->pnn));
+				/* FIXME: This is bogus. We really should be
+				 * taking decision based on the tunables on
+				 * the banned node and not local node.
+				 */
+				DEBUG(DEBUG_WARNING,("Bans are disabled - ignoring ban of node %u\n", bantime->pnn));
 				return 0;
 			}
 
@@ -96,7 +136,8 @@ int32_t ctdb_control_set_ban_state(struct ctdb_context *ctdb, TDB_DATA indata)
 	ctdb->nodes[bantime->pnn]->flags |= NODE_FLAGS_BANNED;
 
 	event_add_timed(ctdb->ev, ctdb->banning_ctx, timeval_current_ofs(bantime->time,0), ctdb_ban_node_event, ctdb);
-	
+
+	ctdb_local_node_got_banned(ctdb);
 	return 0;
 }
 
diff --git a/server/ctdb_freeze.c b/server/ctdb_freeze.c
index 81c5b56..8f87de8 100644
--- a/server/ctdb_freeze.c
+++ b/server/ctdb_freeze.c
@@ -278,38 +278,33 @@ static void ctdb_debug_locks(void)
 /*
   start the freeze process for a certain priority
  */
-int ctdb_start_freeze(struct ctdb_context *ctdb, uint32_t priority)
+void ctdb_start_freeze(struct ctdb_context *ctdb, uint32_t priority)
 {
-	if (priority == 0) {
-		DEBUG(DEBUG_ERR,("Freeze priority 0 requested, remapping to priority 1\n"));
-		priority = 1;
-	}
-
 	if ((priority < 1) || (priority > NUM_DB_PRIORITIES)) {
 		DEBUG(DEBUG_ERR,(__location__ " Invalid db priority : %u\n", priority));
-		return -1;
+		ctdb_fatal(ctdb, "Internal error");
 	}
 
 	if (ctdb->freeze_mode[priority] == CTDB_FREEZE_FROZEN) {
 		/* we're already frozen */
-		return 0;
+		return;
 	}
 
+	DEBUG(DEBUG_ERR, ("Freeze priority %u\n", priority));
+
 	/* Stop any vacuuming going on: we don't want to wait. */
 	ctdb_stop_vacuuming(ctdb);
 
 	/* if there isn't a freeze lock child then create one */
 	if (ctdb->freeze_handles[priority] == NULL) {
 		ctdb->freeze_handles[priority] = ctdb_freeze_lock(ctdb, priority);
-		CTDB_NO_MEMORY(ctdb, ctdb->freeze_handles[priority]);
+		CTDB_NO_MEMORY_FATAL(ctdb, ctdb->freeze_handles[priority]);
 		ctdb->freeze_mode[priority] = CTDB_FREEZE_PENDING;
 	} else {
 		/* The previous free lock child has not yet been able to get locks.
 		 * Invoke debugging script */
 		ctdb_debug_locks();
 	}
-
-	return 0;
 }
 
 /*
@@ -322,8 +317,6 @@ int32_t ctdb_control_freeze(struct ctdb_context *ctdb, struct ctdb_req_control *
 
 	priority = (uint32_t)c->srvid;
 
-	DEBUG(DEBUG_ERR, ("Freeze priority %u\n", priority));
-
 	if (priority == 0) {
 		DEBUG(DEBUG_ERR,("Freeze priority 0 requested, remapping to priority 1\n"));
 		priority = 1;
@@ -335,14 +328,12 @@ int32_t ctdb_control_freeze(struct ctdb_context *ctdb, struct ctdb_req_control *
 	}
 
 	if (ctdb->freeze_mode[priority] == CTDB_FREEZE_FROZEN) {
+		DEBUG(DEBUG_ERR, ("Freeze priority %u\n", priority));
 		/* we're already frozen */
 		return 0;
 	}
 
-	if (ctdb_start_freeze(ctdb, priority) != 0) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to start freezing databases with priority %u\n", priority));
-		return -1;
-	}
+	ctdb_start_freeze(ctdb, priority);
 
 	/* add ourselves to list of waiters */
 	if (ctdb->freeze_handles[priority] == NULL) {
@@ -373,10 +364,7 @@ bool ctdb_blocking_freeze(struct ctdb_context *ctdb)
 	int i;
 
 	for (i=1; i<=NUM_DB_PRIORITIES; i++) {
-		if (ctdb_start_freeze(ctdb, i)) {
-			DEBUG(DEBUG_ERR,(__location__ " Failed to freeze databases of prio %u\n", i));
-			continue;
-		}
+		ctdb_start_freeze(ctdb, i);
 
 		/* block until frozen */
 		while (ctdb->freeze_mode[i] == CTDB_FREEZE_PENDING) {
diff --git a/server/ctdb_monitor.c b/server/ctdb_monitor.c
index c735601..283a584 100644
--- a/server/ctdb_monitor.c
+++ b/server/ctdb_monitor.c
@@ -444,7 +444,6 @@ int32_t ctdb_control_modflags(struct ctdb_context *ctdb, TDB_DATA indata)
 	struct ctdb_node_flag_change *c = (struct ctdb_node_flag_change *)indata.dptr;
 	struct ctdb_node *node;
 	uint32_t old_flags;
-	int i;
 
 	if (c->pnn >= ctdb->num_nodes) {
 		DEBUG(DEBUG_ERR,(__location__ " Node %d is invalid, num_nodes :%d\n", c->pnn, ctdb->num_nodes));
@@ -494,22 +493,7 @@ int32_t ctdb_control_modflags(struct ctdb_context *ctdb, TDB_DATA indata)
 
 	/* if we have become banned, we should go into recovery mode */
 	if ((node->flags & NODE_FLAGS_BANNED) && !(c->old_flags & NODE_FLAGS_BANNED) && (node->pnn == ctdb->pnn)) {
-		/* make sure we are frozen */
-		DEBUG(DEBUG_NOTICE,("This node has been banned - forcing freeze and recovery\n"));
-		/* Reset the generation id to 1 to make us ignore any
-		   REQ/REPLY CALL/DMASTER someone sends to us.
-		   We are now banned so we shouldnt service database calls
-		   anymore.
-		*/
-		ctdb->vnn_map->generation = INVALID_GENERATION;
-
-		for (i=1; i<=NUM_DB_PRIORITIES; i++) {
-			if (ctdb_start_freeze(ctdb, i) != 0) {
-				DEBUG(DEBUG_ERR,(__location__ " Failed to freeze db priority %u\n", i));
-			}
-		}
-		ctdb_release_all_ips(ctdb);
-		ctdb->recovery_mode = CTDB_RECOVERY_ACTIVE;
+		ctdb_local_node_got_banned(ctdb);
 	}
 	
 	return 0;
diff --git a/server/ctdb_recoverd.c b/server/ctdb_recoverd.c
index d7a79fe..15d7bbe 100644
--- a/server/ctdb_recoverd.c
+++ b/server/ctdb_recoverd.c
@@ -82,13 +82,13 @@ static void ctdb_ban_node(struct ctdb_recoverd *rec, uint32_t pnn, uint32_t ban_
 	struct ctdb_context *ctdb = rec->ctdb;
 	struct ctdb_ban_time bantime;
        
-	DEBUG(DEBUG_NOTICE,("Banning node %u for %u seconds\n", pnn, ban_time));
-
 	if (!ctdb_validate_pnn(ctdb, pnn)) {
 		DEBUG(DEBUG_ERR,("Bad pnn %u in ctdb_ban_node\n", pnn));
 		return;
 	}
 
+	DEBUG(DEBUG_NOTICE,("Banning node %u for %u seconds\n", pnn, ban_time));
+
 	bantime.pnn  = pnn;
 	bantime.time = ban_time;
 
@@ -143,6 +143,12 @@ static void ctdb_set_culprit_count(struct ctdb_recoverd *rec, uint32_t culprit,
 		return;
 	}
 
+	/* If we are banned or stopped, do not set other nodes as culprits */
+	if (rec->node_flags & NODE_FLAGS_INACTIVE) {
+		DEBUG(DEBUG_NOTICE, ("This node is INACTIVE, cannot set culprit node %d\n", culprit));
+		return;
+	}
+
 	if (ctdb->nodes[culprit]->ban_state == NULL) {
 		ctdb->nodes[culprit]->ban_state = talloc_zero(ctdb->nodes[culprit], struct ctdb_banning_state);
 		CTDB_NO_MEMORY_VOID(ctdb, ctdb->nodes[culprit]->ban_state);
@@ -1087,7 +1093,7 @@ static int update_local_flags(struct ctdb_recoverd *rec, struct ctdb_node_map *n
 			   Since we are the recovery master we can just as
 			   well update the flags on all nodes.
 			*/
-			ret = ctdb_ctrl_modflags(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn, nodemap->nodes[j].flags, ~nodemap->nodes[j].flags);
+			ret = ctdb_ctrl_modflags(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn, remote_nodemap->nodes[j].flags, ~remote_nodemap->nodes[j].flags);
 			if (ret != 0) {
 				DEBUG(DEBUG_ERR, (__location__ " Unable to update nodeflags on remote nodes\n"));
 				return -1;
@@ -1486,6 +1492,36 @@ static void takeover_fail_callback(struct ctdb_context *ctdb, uint32_t node_pnn,
 }
 
 
+static void ban_misbehaving_nodes(struct ctdb_recoverd *rec, bool *self_ban)
+{
+	struct ctdb_context *ctdb = rec->ctdb;
+	int i;
+	struct ctdb_banning_state *ban_state;
+
+	*self_ban = false;
+	for (i=0; i<ctdb->num_nodes; i++) {
+		if (ctdb->nodes[i]->ban_state == NULL) {
+			continue;
+		}
+		ban_state = (struct ctdb_banning_state *)ctdb->nodes[i]->ban_state;
+		if (ban_state->count < 2*ctdb->num_nodes) {
+			continue;
+		}
+
+		DEBUG(DEBUG_NOTICE,("Node %u reached %u banning credits - banning it for %u seconds\n",
+			ctdb->nodes[i]->pnn, ban_state->count,
+			ctdb->tunable.recovery_ban_period));
+		ctdb_ban_node(rec, ctdb->nodes[i]->pnn, ctdb->tunable.recovery_ban_period);
+		ban_state->count = 0;
+
+		/* Banning ourself? */
+		if (ctdb->nodes[i]->pnn == rec->ctdb->pnn) {
+			*self_ban = true;
+		}
+	}
+}
+
+
 /*
   we are the recmaster, and recovery is needed - start a recovery run
  */
@@ -1501,30 +1537,19 @@ static int do_recovery(struct ctdb_recoverd *rec,
 	uint32_t *nodes;
 	struct timeval start_time;
 	uint32_t culprit = (uint32_t)-1;
+	bool self_ban;
 
 	DEBUG(DEBUG_NOTICE, (__location__ " Starting do_recovery\n"));
 
 	/* if recovery fails, force it again */
 	rec->need_recovery = true;
 
-	for (i=0; i<ctdb->num_nodes; i++) {
-		struct ctdb_banning_state *ban_state;
-
-		if (ctdb->nodes[i]->ban_state == NULL) {
-			continue;
-		}
-		ban_state = (struct ctdb_banning_state *)ctdb->nodes[i]->ban_state;
-		if (ban_state->count < 2*ctdb->num_nodes) {
-			continue;
-		}
-		DEBUG(DEBUG_NOTICE,("Node %u has caused %u recoveries recently - banning it for %u seconds\n",
-			ctdb->nodes[i]->pnn, ban_state->count,
-			ctdb->tunable.recovery_ban_period));
-		ctdb_ban_node(rec, ctdb->nodes[i]->pnn, ctdb->tunable.recovery_ban_period);
-		ban_state->count = 0;
+	ban_misbehaving_nodes(rec, &self_ban);
+	if (self_ban) {
+		DEBUG(DEBUG_NOTICE, ("This node was banned, aborting recovery\n"));
+		return -1;
 	}
 
-
         if (ctdb->tunable.verify_recovery_lock != 0) {
 		DEBUG(DEBUG_ERR,("Taking out recovery lock from recovery daemon\n"));
 		start_time = timeval_current();
@@ -1896,12 +1921,12 @@ static bool ctdb_election_win(struct ctdb_recoverd *rec, struct election_message
 	/* we cant win if we are banned */
 	if (rec->node_flags & NODE_FLAGS_BANNED) {
 		return false;
-	}	
+	}
 
 	/* we cant win if we are stopped */
 	if (rec->node_flags & NODE_FLAGS_STOPPED) {
 		return false;
-	}	
+	}
 
 	/* we will automatically win if the other node is banned */
 	if (em->node_flags & NODE_FLAGS_BANNED) {
@@ -3106,7 +3131,7 @@ static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec,
 	struct ctdb_vnn_map *remote_vnnmap=NULL;
 	int32_t debug_level;
 	int i, j, ret;
-
+	bool self_ban;
 
 
 	/* verify that the main daemon is still running */
@@ -3131,28 +3156,6 @@ static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec,
 	}
 	LogLevel = debug_level;
 
-
-	/* We must check if we need to ban a node here but we want to do this
-	   as early as possible so we dont wait until we have pulled the node
-	   map from the local node. thats why we have the hardcoded value 20
-	*/
-	for (i=0; i<ctdb->num_nodes; i++) {
-		struct ctdb_banning_state *ban_state;
-
-		if (ctdb->nodes[i]->ban_state == NULL) {
-			continue;
-		}
-		ban_state = (struct ctdb_banning_state *)ctdb->nodes[i]->ban_state;
-		if (ban_state->count < 20) {
-			continue;
-		}
-		DEBUG(DEBUG_NOTICE,("Node %u has caused %u recoveries recently - banning it for %u seconds\n",
-			ctdb->nodes[i]->pnn, ban_state->count,
-			ctdb->tunable.recovery_ban_period));
-		ctdb_ban_node(rec, ctdb->nodes[i]->pnn, ctdb->tunable.recovery_ban_period);
-		ban_state->count = 0;
-	}
-
 	/* get relevant tunables */
 	ret = ctdb_ctrl_get_all_tunables(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, &ctdb->tunable);
 	if (ret != 0) {
@@ -3203,74 +3206,93 @@ static void main_loop(struct ctdb_context *ctdb, struct ctdb_recoverd *rec,
 	}
 	nodemap = rec->nodemap;
 
-	/* check which node is the recovery master */
-	ret = ctdb_ctrl_getrecmaster(ctdb, mem_ctx, CONTROL_TIMEOUT(), pnn, &rec->recmaster);
-	if (ret != 0) {
-		DEBUG(DEBUG_ERR, (__location__ " Unable to get recmaster from node %u\n", pnn));
-		return;
-	}
-
-	/* if we are not the recmaster we can safely ignore any ip reallocate requests */
-	if (rec->recmaster != pnn) {
-		if (rec->ip_reallocate_ctx != NULL) {
-			talloc_free(rec->ip_reallocate_ctx);
-			rec->ip_reallocate_ctx = NULL;
-			rec->reallocate_callers = NULL;
-		}
-	}
-	/* if there are takeovers requested, perform it and notify the waiters */
-	if (rec->reallocate_callers) {
-		process_ipreallocate_requests(ctdb, rec);
-	}
+	/* remember our own node flags */
+	rec->node_flags = nodemap->nodes[pnn].flags;
 
-	if (rec->recmaster == (uint32_t)-1) {
-		DEBUG(DEBUG_NOTICE,(__location__ " Initial recovery master set - forcing election\n"));
-		force_election(rec, pnn, nodemap);
+	ban_misbehaving_nodes(rec, &self_ban);
+	if (self_ban) {
+		DEBUG(DEBUG_NOTICE, ("This node was banned, restart main_loop\n"));
 		return;
 	}
 
-
-	/* if the local daemon is STOPPED, we verify that the databases are
-	   also frozen and thet the recmode is set to active 
+	/* if the local daemon is STOPPED or BANNED, we verify that the databases are
+	   also frozen and that the recmode is set to active.
 	*/
-	if (nodemap->nodes[pnn].flags & NODE_FLAGS_STOPPED) {
+	if (rec->node_flags & (NODE_FLAGS_STOPPED | NODE_FLAGS_BANNED)) {
 		ret = ctdb_ctrl_getrecmode(ctdb, mem_ctx, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, &ctdb->recovery_mode);
 		if (ret != 0) {
 			DEBUG(DEBUG_ERR,(__location__ " Failed to read recmode from local node\n"));
 		}
 		if (ctdb->recovery_mode == CTDB_RECOVERY_NORMAL) {
-			DEBUG(DEBUG_ERR,("Node is stopped but recovery mode is not active. Activate recovery mode and lock databases\n"));
+			DEBUG(DEBUG_ERR,("Node is stopped or banned but recovery mode is not active. Activate recovery mode and lock databases\n"));
 
 			ret = ctdb_ctrl_freeze_priority(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, 1);
 			if (ret != 0) {
-				DEBUG(DEBUG_ERR,(__location__ " Failed to freeze node due to node being STOPPED\n"));
+				DEBUG(DEBUG_ERR,(__location__ " Failed to freeze node in STOPPED or BANNED state\n"));
 				return;
 			}
 			ret = ctdb_ctrl_setrecmode(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, CTDB_RECOVERY_ACTIVE);
 			if (ret != 0) {
-				DEBUG(DEBUG_ERR,(__location__ " Failed to activate recovery mode due to node being stopped\n"));
+				DEBUG(DEBUG_ERR,(__location__ " Failed to activate recovery mode in STOPPED or BANNED state\n"));
 
 				return;
 			}


-- 
CTDB repository


More information about the samba-cvs mailing list