[SCM] CTDB repository - branch master updated - ctdb-1.0.114-217-g9ca09ee

Ronnie Sahlberg sahlberg at samba.org
Sun Aug 8 19:48:23 MDT 2010


The branch, master has been updated
       via  9ca09ee9129b787428a2ceac9731b12166dc8718 (commit)
       via  36c8244a0f68c7c9bbee40982f230e9d14d3c0ea (commit)
       via  69c95b2a42f55b80cd8d91a90ab55166f964163b (commit)
       via  c2bce140da7c4b118394ee77bb9d0348d27e7e95 (commit)
       via  e9b3f5b1b51d541a911a27eb4348b368f28d185e (commit)
       via  cdcd05662a30b51caaeeab4ac44138cac2474e0a (commit)
       via  b93b60ec96d02ce4f54921e85a5c5554d1fc0c55 (commit)
       via  d33fa4d6557aab1938049f194c2de55f2c395bd2 (commit)
       via  1ef7c8e64c7a39330be09ae4d00b70238133e0b5 (commit)
       via  5d9e4b6ee7d2b5290a74e7be79bdf51a43b72f43 (commit)
       via  ae52cb63756bc60de8d32e01bac5d70975a1c7a0 (commit)
       via  b2a2e301025d7fbfe5eeaac436693cde6d404490 (commit)
       via  bc38c17e4115fae00c89d00537fdcfe621111b37 (commit)
       via  ecb80e2b6be9326708d1fc87ad3028c6836d5858 (commit)
       via  058501b92f602e7d2240d1cb08ed78a807564c48 (commit)
      from  79ef9909dfa0904d789c69eb6b9c80e8908a1100 (commit)

http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 9ca09ee9129b787428a2ceac9731b12166dc8718
Merge: 79ef9909dfa0904d789c69eb6b9c80e8908a1100 36c8244a0f68c7c9bbee40982f230e9d14d3c0ea
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date:   Mon Aug 9 11:35:38 2010 +1000

    Merge remote branch 'martins/master'

commit 36c8244a0f68c7c9bbee40982f230e9d14d3c0ea
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Aug 6 11:10:56 2010 +1000

    Add some command-line options to ctdb_diagnostics.
    
    In some contexts ctdb_diagnostics generates too many errors when it is
    run on heterogeneous and machine-configured clusters.  In some
    clusters some nodes are expected to be differently configured and also
    machine-generated configured files can have comments containing
    timestamps.
    
    This adds some command-line options that can be used to reduce the
    number of errors reported:
    
        -n <nodes>  Comma separated list of nodes to operate on
        -c          Ignore comment lines (starting with '#') in file comparisons
        -w          Ignore whitespace in file comparisons
        --no-ads    Do not use commands that assume an Active Directory Server
    
    The -n option simply allows ctdb_diagnostics to operate on a subset of
    nodes, avoiding file comparisons with and data collection on nodes
    that are differently configured.  For file comparisons, instead of
    showing each file on the current node and then comparing other nodes
    to that file, the file from the first (available or requested) nodes
    is shown and then other nodes are compared to that.  That has resulted
    in changes in output - that is, ctdb diagnostics no longer prints
    messages referencing the current node.
    
    -c and -w are used to weaken comparisons between configuration files.
    
    --no-ads can be used to avoid running ADS-specific commands if a
    cluster uses LDAP (or other non-ADS) configuration.
    
    This also fixes a number of bugs in related code:
    
    * A call to onnode was losing the >> NODE ...  << lines because they
      now go to stderr.  This was changed in onnode long ago but
      ctdb_diagnostics was never updated to match.
    
    * ctdb_diagnostics was counting lines in /etc/ctdb/nodes to determine
      what nodes to operate on.  For some time the nodes file has
      supported syntax that makes this invalid.  "ctdb listnodes -Y" is
      now used to list available nodes.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 69c95b2a42f55b80cd8d91a90ab55166f964163b
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Aug 5 16:03:21 2010 +1000

    Test suite: remove unnecessary verbosity from enable/continue tests.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit c2bce140da7c4b118394ee77bb9d0348d27e7e95
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Aug 5 16:01:23 2010 +1000

    Test suite: Fix typo in continue test.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit e9b3f5b1b51d541a911a27eb4348b368f28d185e
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Aug 5 15:58:56 2010 +1000

    Test suite: weaken ctdb continue/enable tests for non-deterministic IPs.
    
    These tests currently wait for the old IPs to fail back to the test
    node.  This isn't guaranteed with DeterministicIPs disabled.
    
    This changes those tests to wait until the test node gets at least 1
    IP assigned.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit cdcd05662a30b51caaeeab4ac44138cac2474e0a
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Aug 5 15:29:40 2010 +1000

    initscript: wait until we can ping ctdbd before setting tunables.
    
    Currently we do a "sleep 1" after starting and before running
    set_ctdb_variables to set the tunables.  This is too arbitrary and
    might fail if the system is heavily loaded.  This, for example, could
    result in some nodes running with DeterministicIPs and some without,
    in which case a different IP allocation algorithm would run depending
    on who is the recmaster!
    
    This makes the start function wait until "ctdb ping" succeeds (with 10
    second timeout) before trying to run set_ctdb_variables.  If a timeout
    occurs then the start function attempts to kill ctdbd before exiting
    with a failure.
    
    It also cleans up the status reporting code for Red Hat and SUSE so
    that the final status code is reported.  Currently there are cases
    where a correct status is prematurely reported before a failure
    occurs.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit b93b60ec96d02ce4f54921e85a5c5554d1fc0c55
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Aug 5 13:43:50 2010 +1000

    Test suite - make the ctdb_fetch test cope with "Reqid wrap!" messages.
    
    Recent CTDB notice the wrap and print this message.  The test needs to
    cope.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit d33fa4d6557aab1938049f194c2de55f2c395bd2
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Aug 5 11:40:05 2010 +1000

    Test suite: remove thaw/freeze tests.
    
    They test debugging commands that no longer operate as expected.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 1ef7c8e64c7a39330be09ae4d00b70238133e0b5
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Aug 4 16:08:12 2010 +1000

    Test suite - fix addip test.
    
    The test currently checks that all existing IPs plus the newly added
    IP are on the test node after "ctdb addip" is run.  With
    DeterministicIPs enabled, if the new IP is "before" other IPs then the
    other IPs may be shuffled by the deterministic IPs modulo algorithm.
    This will happen on the 1st recovery after the move.  Sometimes this
    recovery happens before we get the list of IPs to check and sometimes
    after, so the test is racy.
    
    The fix is to simply check for the presence of the new IP and not
    worry about the others.  This reduces whatever value this test
    had... but you can't have everything.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 5d9e4b6ee7d2b5290a74e7be79bdf51a43b72f43
Merge: ae52cb63756bc60de8d32e01bac5d70975a1c7a0 b2a2e301025d7fbfe5eeaac436693cde6d404490
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Aug 4 16:05:39 2010 +1000

    Merge remote branch 'martins/master'

commit ae52cb63756bc60de8d32e01bac5d70975a1c7a0
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Aug 4 13:16:06 2010 +1000

    Test suite - try to make addip test more reliable and add some debugging.
    
    This test is failing in some situations.  The "ctdb addip" command
    works but the IP never appears in the "ctdb ip" output.
    
    Try restricting the last octet to be between 101-199.  At the moment
    addresses like 10.0.2.1 are being chosen and these are often the
    address of the host machine in autocluster configurations... so might
    cause weirdness.
    
    Also add some debugging if checking for the IP address times out.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit b2a2e301025d7fbfe5eeaac436693cde6d404490
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Aug 3 11:51:14 2010 +1000

    Testing: IP allocation simulation - add option to change odds of a failure.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit bc38c17e4115fae00c89d00537fdcfe621111b37
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Aug 3 11:41:50 2010 +1000

    Testing: IP allocation simulation - clean up usage message.
    
    Group options better and make the language consistent between options.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit ecb80e2b6be9326708d1fc87ad3028c6836d5858
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Aug 3 11:37:34 2010 +1000

    Testing: IP allocation simulation - print maximum number of unhealthy nodes.
    
    This can imply something about imbalance.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 058501b92f602e7d2240d1cb08ed78a807564c48
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Aug 3 11:36:33 2010 +1000

    Testing: IP allocation simulation - improve help for options.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

-----------------------------------------------------------------------

Summary of changes:
 config/ctdb.init                       |   36 ++++++-
 tests/scripts/ctdb_test_functions.bash |   24 +++++
 tests/simple/16_ctdb_config_add_ip.sh  |   20 +++--
 tests/simple/18_ctdb_freeze.sh         |   48 ---------
 tests/simple/19_ctdb_thaw.sh           |   55 -----------
 tests/simple/32_ctdb_enable.sh         |   12 +--
 tests/simple/42_ctdb_continue.sh       |   11 +--
 tests/simple/52_ctdb_fetch.sh          |    2 +-
 tests/takeover/ctdb_takeover.py        |   38 +++++---
 tools/ctdb_diagnostics                 |  163 +++++++++++++++++++++++---------
 10 files changed, 214 insertions(+), 195 deletions(-)
 delete mode 100755 tests/simple/18_ctdb_freeze.sh
 delete mode 100755 tests/simple/19_ctdb_thaw.sh


Changeset truncated at 500 lines:

diff --git a/config/ctdb.init b/config/ctdb.init
index 7dfdd26..fc66ab2 100755
--- a/config/ctdb.init
+++ b/config/ctdb.init
@@ -164,6 +164,19 @@ set_retval() {
     return $1
 }
 
+wait_until_ready () {
+    _timeout="${1:-10}" # default is 10 seconds
+
+    _count=0
+    while ! ctdb ping >/dev/null 2>&1 ; do
+	if [ $_count -ge $_timeout ] ; then
+	    return 1
+	fi
+	sleep 1
+	_count=$(($_count + 1))
+    done
+}
+
 ctdbd=${CTDBD:-/usr/sbin/ctdbd}
 
 start() {
@@ -193,14 +206,11 @@ start() {
 	    ;;
 	suse)
 	    eval startproc $ctdbd "$CTDB_OPTIONS"
-	    rc_status -v
 	    RETVAL=$?
 	    ;;
 	redhat)
 	    eval $ctdbd "$CTDB_OPTIONS"
 	    RETVAL=$?
-	    [ $RETVAL -eq 0 ] && success || failure
-	    echo
 	    [ $RETVAL -eq 0 ] && touch /var/lock/subsys/ctdb || RETVAL=1
 	    ;;
 	debian)
@@ -210,9 +220,25 @@ start() {
 	    ;;
     esac
 
-    sleep 1
+    if [ $RETVAL -eq 0 ] ; then
+	if wait_until_ready ; then
+	    set_ctdb_variables
+	else
+	    RETVAL=1
+	    pkill -9 -f $ctdbd >/dev/null 2>&1
+	fi
+    fi
 
-    set_ctdb_variables
+    case $init_style in
+	suse)
+	    set_retval $RETVAL
+	    rc_status -v
+	    ;;
+	redhat)
+	    [ $RETVAL -eq 0 ] && success || failure
+	    echo
+	    ;;
+    esac
 
     return $RETVAL
 }
diff --git a/tests/scripts/ctdb_test_functions.bash b/tests/scripts/ctdb_test_functions.bash
index 4f05888..42053c0 100644
--- a/tests/scripts/ctdb_test_functions.bash
+++ b/tests/scripts/ctdb_test_functions.bash
@@ -482,6 +482,30 @@ wait_until_ips_are_on_nodeglob ()
     wait_until 60 ips_are_on_nodeglob "$@"
 }
 
+node_has_some_ips ()
+{
+    local node="$1"
+
+    local out
+
+    all_ips_on_node 1
+
+    while read ip pnn ; do
+	if [ "$node" = "$pnn" ] ; then
+	    return 0
+	fi
+    done <<<"$out" # bashism to avoid problem setting variable in pipeline.
+
+    return 1
+}
+
+wait_until_node_has_some_ips ()
+{
+    echo "Waiting for node to have some IPs..."
+
+    wait_until 60 node_has_some_ips "$@"
+}
+
 get_src_socket ()
 {
     local proto="$1"
diff --git a/tests/simple/16_ctdb_config_add_ip.sh b/tests/simple/16_ctdb_config_add_ip.sh
index 79849f2..6fee386 100755
--- a/tests/simple/16_ctdb_config_add_ip.sh
+++ b/tests/simple/16_ctdb_config_add_ip.sh
@@ -73,7 +73,7 @@ add_ip=""
 # loop through the possible IP addreses.
 for i in $test_node_ips ; do
     prefix="${i%.*}"
-    for j in $(seq 1 254) ; do
+    for j in $(seq 101 199) ; do
 	try="${prefix}.${j}"
 	# Try to make sure it isn't used anywhere!
 
@@ -102,15 +102,19 @@ for i in $test_node_ips ; do
     done
 done
 
-if [ -n "$add_ip" ] ; then
-    echo "Adding IP: ${add_ip/:/ on interface }"
-    try_command_on_node $test_node $CTDB addip ${add_ip/:/ }
+if [ -z "$add_ip" ] ; then
+    echo "BAD: Unable to find IP address to add."
+    exit 1
+fi
 
-    echo "Waiting for IP to be added..."
-    wait_until 60 ips_are_on_nodeglob $test_node $test_node_ips ${add_ip%/*}
+echo "Adding IP: ${add_ip/:/ on interface }"
+try_command_on_node $test_node $CTDB addip ${add_ip/:/ }
 
+echo "Waiting for IP to be added..."
+if wait_until 60 ips_are_on_nodeglob $test_node ${add_ip%/*} ; then
     echo "That worked!"
 else
-    echo "BAD: Unable to find IP address to add."
-    testfailures=1
+    echo "BAD: IP didn't get added."
+    try_command_on_node $test_node ctdb ip -n all
+    exit 1
 fi
diff --git a/tests/simple/18_ctdb_freeze.sh b/tests/simple/18_ctdb_freeze.sh
deleted file mode 100755
index 792a4ee..0000000
--- a/tests/simple/18_ctdb_freeze.sh
+++ /dev/null
@@ -1,48 +0,0 @@
-#!/bin/bash
-
-test_info()
-{
-    cat <<EOF
-Verify 'ctdb freeze' works correctly.
-
-This is a superficial test that simply checks that 'ctdb statistics'
-reports the node becomes frozen.  No checks are done to ensure that
-client access to databases is blocked.
-
-Prerequisites:
-
-* An active CTDB cluster with at least 2 active nodes.
-
-Steps:
-
-1. Verify that the status on all of the ctdb nodes is 'OK'.
-2. Use 'ctdb freeze -n <node>' to freeze the databases on one of the
-   nodes.
-3. Run 'ctdb statistics' to verify that 'frozen' has the value '1' on
-   the node.
-
-Expected results:
-
-* When the database is frozen, the 'frozen' variable in the
-  'ctdb statistics' output is set to 1 on the node.
-EOF
-}
-
-. ctdb_test_functions.bash
-
-ctdb_test_init "$@"
-
-set -e
-
-cluster_is_healthy
-
-# Reset configuration
-ctdb_restart_when_done
-
-test_node=1
-
-echo "Freezing node $test_node"
-
-try_command_on_node 0 $CTDB freeze -n $test_node
-
-wait_until_node_has_status $test_node frozen
diff --git a/tests/simple/19_ctdb_thaw.sh b/tests/simple/19_ctdb_thaw.sh
deleted file mode 100755
index 7bbf490..0000000
--- a/tests/simple/19_ctdb_thaw.sh
+++ /dev/null
@@ -1,55 +0,0 @@
-#!/bin/bash
-
-test_info()
-{
-    cat <<EOF
-Verify 'ctdb thaw' works correctly.
-
-This is a superficial test that simply checks that 'ctdb statistics'
-reports the node becomes unfrozen.  No checks are done to ensure that
-client access to databases is unblocked.
-
-Prerequisites:
-
-* An active CTDB cluster with at least 2 active nodes.
-
-Steps:
-
-1. Verify that the status on all of the ctdb nodes is 'OK'.
-2. Use 'ctdb freeze -n <node>' to freeze the databases on one of the
-   nodes.
-3. Run 'ctdb statistics' to verify that 'frozen' has the value '1' on
-   the node.
-4, Now run 'ctdb thaw -n <node>' on the same node.
-5. Run 'ctdb statistics' to verify that 'frozen' once again has the
-   value '0' on the node.
-
-
-Expected results:
-
-* 'ctdb thaw' causes a node to 'thaw' and the status change can be
-  seem via 'ctdb statistics'.
-EOF
-}
-
-. ctdb_test_functions.bash
-
-ctdb_test_init "$@"
-
-set -e
-
-cluster_is_healthy
-
-test_node=1
-
-echo "Freezing node $test_node"
-
-try_command_on_node 0 $CTDB freeze -n $test_node
-
-wait_until_node_has_status $test_node frozen
-
-echo "That worked!  Now thawing node $test_node"
-
-try_command_on_node 0 $CTDB thaw -n $test_node
-
-wait_until_node_has_status $test_node unfrozen
diff --git a/tests/simple/32_ctdb_enable.sh b/tests/simple/32_ctdb_enable.sh
index 4c1026e..7bfdc43 100755
--- a/tests/simple/32_ctdb_enable.sh
+++ b/tests/simple/32_ctdb_enable.sh
@@ -23,8 +23,7 @@ Steps:
    failed over to other nodes.
 5. Enable the disabled node using 'ctdb enable -n '<node>'.
 6. Verify that the status changes back to 'OK'.
-7. Verify that the public IP addreses served by the disabled node are
-   failed back to the node.
+7. Verify that some public IP addreses are failed back to the node.
 
 
 Expected results:
@@ -63,11 +62,4 @@ try_command_on_node 1 $CTDB enable -n $test_node
 
 wait_until_node_has_status $test_node enabled
 
-# BUG: this is only guaranteed if DeterministicIPs is 1 and
-#      NoIPFailback is 0.
-if wait_until_ips_are_on_nodeglob "$test_node" $test_node_ips ; then
-    echo "All IPs moved."
-else
-    echo "Some IPs didn't move."
-    testfailures=1
-fi
+wait_until_node_has_some_ips "$test_node"
diff --git a/tests/simple/42_ctdb_continue.sh b/tests/simple/42_ctdb_continue.sh
index b472420..82e1534 100755
--- a/tests/simple/42_ctdb_continue.sh
+++ b/tests/simple/42_ctdb_continue.sh
@@ -23,7 +23,7 @@ Steps:
    the node are failed over to one of the other nodes.
 5. Use 'ctdb continue' to bring the node back online.
 6. Verify that the status of the node changes back to 'OK' and that
-   the public IP addresses move back to the node.
+   some public IP addresses move back to the node.
 
 Expected results:
 
@@ -61,11 +61,4 @@ try_command_on_node 1 $CTDB continue -n $test_node
 
 wait_until_node_has_status $test_node notstopped
 
-# BUG: this is only guaranteed if DeterministicIPs is 1 and
-#      NoIPFailback is 0.
-if wait_until_ips_are_on_nodeglob "$test_node" $ips ; then
-    echo "All IPs moved."
-else
-    echo "Some IPs didn't move."
-    testfailures=1
-fi
+wait_until_node_has_some_ips "$test_node"
diff --git a/tests/simple/52_ctdb_fetch.sh b/tests/simple/52_ctdb_fetch.sh
index 236b697..5936419 100755
--- a/tests/simple/52_ctdb_fetch.sh
+++ b/tests/simple/52_ctdb_fetch.sh
@@ -40,7 +40,7 @@ num_nodes=$(echo "$out" | wc -l)
 echo "Running ctdb_fetch on all $num_nodes nodes."
 try_command_on_node -v -pq all $CTDB_TEST_WRAPPER $VALGRIND ctdb_fetch -n $num_nodes
 
-pat='^(Fetch: [[:digit:]]+(\.[[:digit:]]+)? msgs/sec[[:space:]]?|msg_count=[[:digit:]]+ on node [[:digit:]]|Fetching final record|DATA:|Test data|Waiting for cluster[[:space:]]?|)+$'
+pat='^(Fetch: [[:digit:]]+(\.[[:digit:]]+)? msgs/sec[[:space:]]?|msg_count=[[:digit:]]+ on node [[:digit:]]|Fetching final record|DATA:|Test data|Waiting for cluster[[:space:]]?|.*: Reqid wrap!|)+$'
 sanity_check_output 1 "$pat" "$out"
 
 # Filter out the performance figures:
diff --git a/tests/takeover/ctdb_takeover.py b/tests/takeover/ctdb_takeover.py
index c7341b9..b87e35f 100755
--- a/tests/takeover/ctdb_takeover.py
+++ b/tests/takeover/ctdb_takeover.py
@@ -44,35 +44,38 @@ def process_args(extra_options=[]):
     parser.add_option("--ni",
                       action="store_true", dest="no_ip_failback", default=False,
                       help="turn on no_ip_failback")
-    parser.add_option("-v", "--verbose",
-                      action="store_true", dest="verbose", default=False,
-                      help="print information and actions taken to stdout")
+    parser.add_option("-b", "--balance",
+                      action="store_true", dest="balance", default=False,
+                      help="show (im)balance information after each event")
     parser.add_option("-d", "--diff",
                       action="store_true", dest="diff", default=False,
-                      help="after each recovery show IP address movements")
+                      help="show IP address movements for each event")
     parser.add_option("-n", "--no-print",
                       action="store_false", dest="show", default=True,
-                      help="after each recovery don't print IP address layout")
+                      help="don't show IP address layout after each event")
+    parser.add_option("-v", "--verbose",
+                      action="store_true", dest="verbose", default=False,
+                      help="print information and actions taken to stdout")
     parser.add_option("--hack",
                       action="store", type="int", dest="hack", default=0,
                       help="apply a hack (see the code!!!)")
     parser.add_option("-r", "--retries",
                       action="store", type="int", dest="retries", default=5,
-                      help="number of retry loops for rebalancing")
+                      help="number of retry loops for rebalancing [default: %default]")
     parser.add_option("-i", "--iterations",
                       action="store", type="int", dest="iterations",
                       default=1000,
-                      help="number of iterations to run in test")
+                      help="number of iterations to run in test [default: %default]")
+    parser.add_option("-o", "--odds",
+                      action="store", type="int", dest="odds", default=4,
+                      help="make the chances of a failover 1 in ODDS [default: %default]")
 
     def seed_callback(option, opt, value, parser):
         random.seed(value)
     parser.add_option("-s", "--seed",
                       action="callback", type="int", callback=seed_callback,
-                      help="number of iterations to run in test")
+                      help="initial random number seed for random events")
 
-    parser.add_option("-b", "--balance",
-                      action="store_true", dest="balance", default=False,
-                      help="show (im)balance information")
     parser.add_option("-x", "--exit",
                       action="store_true", dest="exit", default=False,
                       help="exit on the 1st gratuitous IP move")
@@ -124,10 +127,12 @@ class Cluster(object):
         self.no_ip_failback = options.no_ip_failback
         self.all_public_ips = set()
 
+        # Statistics
         self.ip_moves = []
         self.grat_ip_moves = []
         self.imbalance = []
         self.events = -1
+        self.num_unhealthy = []
 
         self.prev = None
 
@@ -146,6 +151,7 @@ class Cluster(object):
         print "Gratuitous IP moves: %6d" % sum(self.grat_ip_moves)
         print "Max imbalance:       %6d" % max(self.imbalance)
         print "Final imbalance:     %6d" % self.imbalance[-1]
+        print "Maximum unhealthy:   %6d" % max(self.num_unhealthy)
         print_end()
 
     def find_pnn_with_ip(self, ip):
@@ -189,8 +195,8 @@ class Cluster(object):
         """Make a random node healthy or unhealthy.
 
         If all nodes are healthy or unhealthy, then invert one of
-        them.  Otherwise, there's a 1/4 chance of making another node
-        unhealthy."""
+        them.  Otherwise, there's a 1 in options.odds chance of making
+        another node unhealthy."""
 
         num_nodes = len(self.nodes)
         healthy_pnns = [i for (i,n) in enumerate(self.nodes) if n.healthy]
@@ -200,7 +206,7 @@ class Cluster(object):
             self.unhealthy(random.randint(0, num_nodes-1))
         elif num_healthy == 0:
             self.healthy(random.randint(0, num_nodes-1))
-        elif random.randint(1, 4) == 1:
+        elif random.randint(1, options.odds) == 1:
             self.unhealthy(random.choice(healthy_pnns))
         else:
             all_pnns = range(num_nodes)
@@ -483,6 +489,10 @@ class Cluster(object):
             print imbalance
             print_end()
 
+        num_unhealthy = len(self.nodes) - \
+            len([n for n in self.nodes if n.healthy])
+        self.num_unhealthy.append(num_unhealthy)
+
         if options.show:
             print_begin("STATE")
             print self
diff --git a/tools/ctdb_diagnostics b/tools/ctdb_diagnostics
index 2cdf3cc..eae5483 100755
--- a/tools/ctdb_diagnostics
+++ b/tools/ctdb_diagnostics
@@ -1,15 +1,60 @@
 #!/bin/sh
 # a script to test the basic setup of a CTDB/Samba install 
 # tridge at samba.org September 2007
+# martin at meltin.net August 2010
+
+usage ()
+{
+    cat >&2 <<EOF
+Usage: ctdb_diagnostics [OPTION] ...
+  options:
+    -n <nodes>  Comma separated list of nodes to operate on
+    -c          Ignore comment lines (starting with '#') in file comparisons
+    -w          Ignore whitespace in file comparisons
+    --no-ads    Do not use commands that assume an Active Directory Server
+EOF
+    exit 1
+
+}
+
+nodes=$(ctdb listnodes -Y | cut -d: -f2)
+diff_opts=
+no_ads=false
+
+parse_options ()
+{
+    temp=$(getopt -n "ctdb_diagnostics" -o "n:cwh" -l no-ads,help -- "$@")
+
+    [ $? != 0 ] && usage
+
+    eval set -- "$temp"
+
+    while true ; do
+	case "$1" in
+	    -n) nodes=$(echo "$2" | sed -e 's@,@ @g') ; shift 2 ;;
+	    -c) diff_opts="${diff_opts} -I ^#.*" ; shift ;;
+	    -w) diff_opts="${diff_opts} -w" ; shift ;;
+	    --no-ads) no_ads=true ; shift ;;
+	    --) shift ; break ;;
+	    -h|--help|*) usage ;;
+	esac
+    done
+
+    [ $# -ne 0 ] && usage
+}
+
+parse_options "$@"
+
+nodes_comma=$(echo $nodes | sed -e 's@[[:space:]]@, at g')
 
 PATH="$PATH:/sbin:/usr/sbin:/usr/lpp/mmfs/bin"
 
 # list of config files that must exist and that we check are the same 
-# on all nodes
+# on the nodes
 CONFIG_FILES_MUST="/etc/krb5.conf /etc/hosts /etc/ctdb/nodes /etc/sysconfig/ctdb /etc/resolv.conf /etc/nsswitch.conf /etc/sysctl.conf /etc/samba/smb.conf /etc/fstab /etc/multipath.conf /etc/pam.d/system-auth /etc/sysconfig/nfs /etc/exports /etc/vsftpd/vsftpd.conf"
 
 # list of config files that may exist and should be checked that they
-# are the same on all nodes
+# are the same on the nodes
 CONFIG_FILES_MAY="/etc/ctdb/public_addresses /etc/ctdb/static-routes"
 
 2>&1
@@ -41,67 +86,83 @@ show_file() {
 }
 
 show_all() {


-- 
CTDB repository


More information about the samba-cvs mailing list