[SCM] CTDB repository - branch 1.2 updated - ctdb-1.9.1-239-g62179c8

Tue Nov 16 18:06:58 MST 2010

The branch, 1.2 has been updated
       via  62179c80701cd4f773afaba54cde577de234f72d (commit)
       via  2d772b7354699fabd06fa0b5b89ddbd6bdd5826d (commit)
       via  b717cf7e4f40457df0063e08122eeafd9c339168 (commit)
       via  e7a89ac55fc0f6ff91ee9e7ee1c4d42f6baa2b77 (commit)
       via  fc93ba5eb93746a91099c9fd28ff5dfe043415a9 (commit)
       via  c39d7d12a22ce1f8a63618ebc05a7a07ed586675 (commit)
       via  c14359b56f9dde3d7e30e3af5b7fc71a3acefd97 (commit)
       via  9092deb2976f175a12243587d92416c860b42929 (commit)
      from  6b21c95236a07a656af0e110eb3f9d833aea71dd (commit)

http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=shortlog;h=1.2


- Log -----------------------------------------------------------------
commit 62179c80701cd4f773afaba54cde577de234f72d
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Aug 31 17:40:40 2010 +1000

    Eventscripts: make loadconfig() function hookable by the test suite.
    
    Rename loadconfig() to _loadconfig().  Add a new loadconfig() that
    simply calls _loadconfig().
    
    This makes it easy for the test suite to override loadconfig().
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 2d772b7354699fabd06fa0b5b89ddbd6bdd5826d
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Nov 16 19:42:31 2010 +1100

    Make a time comparison in 60.nfs eventscript more readable.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit b717cf7e4f40457df0063e08122eeafd9c339168
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Nov 16 19:31:18 2010 +1100

    60.nfs only fails or warns after 10 consecutive nfsd/statd failures.
    
    These failures are sometimes the result of slow restarts so we want to
    avoid dirtying the logs or marking a node unhealthy because of them,
    unless they are excessive.
    
    For these 2 cases we use the existing fail counting code but hack a
    temporary service_name in a subshell to allow separate fail counts.
    
    We also update ctdb_check_rpc() so that it captures the error output
    from rpcinfo and we add a message including the service name to the
    beginning.  The error is printed to stdout but is also stored in
    ctdb_check_rpc_out to allow it to be conditionally used by the caller.
    This function also now returns non-zero rather than exiting on
    failure.
    
    Other direct rpcinfo calls are relaced by called to ctdb_check_rpc()
    for consistency.
    
    Option handling code for service restarts is cleaned up so that fits
    in 80 columns.  A more informative restart messageis now used in all
    cases, printing the exact command being used to start a service.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit e7a89ac55fc0f6ff91ee9e7ee1c4d42f6baa2b77
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Oct 12 11:10:38 2010 +1100

    Test suite: fix typo in ctdb ping test grep pattern.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit fc93ba5eb93746a91099c9fd28ff5dfe043415a9
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Oct 6 16:32:22 2010 +1100

    Test suite: match changed output for ctdb ping to disconnected node.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit c39d7d12a22ce1f8a63618ebc05a7a07ed586675
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Oct 15 15:09:08 2010 +1100

    Test suite: make statistics test cope with changes to statistics output.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit c14359b56f9dde3d7e30e3af5b7fc71a3acefd97
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date:   Mon Nov 15 16:47:22 2010 +1100

    New version 1.2.10
    
    * Mon Nov 15 2010 : Version 1.2.10
     - Make sure to initialize the statistics start time to current time
       instead of leaving it to point to start of epoch.
       CQ : S1020838
     - Create a new tunable DisableIPFailover that is used to tell ctdb
       to not check any ip allocation at all and never do any failover
       This can be used to stop/restart individual nodes without causing
       any ip failovers to happen.

commit 9092deb2976f175a12243587d92416c860b42929
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date:   Mon Nov 15 16:30:44 2010 +1100

    initialize the statistics to the current time, not start of epoch
    this makes "ctdb statistics" show correct "start of starts collection"

-----------------------------------------------------------------------

Summary of changes:
 config/events.d/60.nfs             |   77 ++++++++++++++++++++++--------------
 config/functions                   |   18 ++++++--
 packaging/RPM/ctdb.spec.in         |   10 ++++-
 server/ctdb_statistics.c           |    1 +
 tests/simple/09_ctdb_ping.sh       |    2 +-
 tests/simple/14_ctdb_statistics.sh |    2 +-
 6 files changed, 72 insertions(+), 38 deletions(-)


Changeset truncated at 500 lines:

diff --git a/config/events.d/60.nfs b/config/events.d/60.nfs
index 57c81d3..038adbb 100755
--- a/config/events.d/60.nfs
+++ b/config/events.d/60.nfs
@@ -51,24 +51,43 @@ case "$1" in
 
 	# check that statd responds to rpc requests
 	# if statd is not running we try to restart it
-	rpcinfo -u localhost 100024 1 > /dev/null || {
-		RPCSTATDOPTS=""
-		[ -n "$STATD_HOSTNAME" ] && RPCSTATDOPTS="$RPCSTATDOPTS -n $STATD_HOSTNAME"
-		[ -n "$STATD_PORT" ] && RPCSTATDOPTS="$RPCSTATDOPTS -p $STATD_PORT"
-		[ -n "$STATD_OUTGOING_PORT" ] && RPCSTATDOPTS="$RPCSTATDOPTS -o $STATD_OUTGOING_PORT"
-		rpc.statd $RPCSTATDOPTS 
-		echo "ERROR: STATD is not responding. Trying to restart it. [rpc.statd $RPCSTATDOPTS]"
-	}
+	if ctdb_check_rpc "STATD" 100024 1 >/dev/null ; then
+		(service_name="nfs_statd"; ctdb_counter_init)
+	else
+		p="rpc.statd" ; cmd="$p"
+		cmd="${cmd}${STATD_HOSTNAME:+ -n }${STATD_HOSTNAME}"
+		cmd="${cmd}${STATD_PORT:+ -p }${STATD_PORT}"
+		cmd="${cmd}${STATD_OUTGOING_PORT:+ -o }${STATD_OUTGOING_PORT}"
+		(
+			service_name="nfs_statd"
+			ctdb_counter_incr
+			ctdb_check_counter_limit 10 quiet >/dev/null
+		) || {
+			echo "$ctdb_check_rpc_out"
+			echo "Trying to restart STATD [$cmd]"
+		}
+		$cmd
+	fi
 
 
 	# check that NFS responds to rpc requests
 	[ "$CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK" = "yes" ] || {
-	    (ctdb_check_rpc "NFS" 100003 3)
-	    [ $? = "0" ] || {
-		echo "Trying to restart NFS service"
-		startstop_nfs restart
-		exit 1
-	    }
+	    if ctdb_check_rpc "NFS" 100003 3 >/dev/null ; then
+		(service_name="nfs_knfsd"; ctdb_counter_init)
+	    else
+		(
+			service_name="nfs_knfsd"
+			ctdb_counter_incr
+			ctdb_check_counter_limit 10 quiet >/dev/null
+		) || {
+			echo "$ctdb_check_rpc_out"
+			echo "Trying to restart NFS service"
+			startstop_nfs restart
+			exit 1
+		}
+		# we haven't hit the failure limit so restart quietly
+		startstop_nfs restart >/dev/null 2>&1 &
+	    fi
 	}
 
 	# and that its directories are available
@@ -79,8 +98,7 @@ case "$1" in
 	} || exit $?
 
 	# check that lockd responds to rpc requests
-	(ctdb_check_rpc "lockd" 100021 1)
-	[ $? = "0" ] || {
+	ctdb_check_rpc "LOCKD" 100021 1 || {
 		echo "Trying to restart lock manager service"
 		startstop_nfs restart
 		startstop_nfslock restart
@@ -89,31 +107,30 @@ case "$1" in
 
 	# mount needs special handling since it is sometimes not started
 	# correctly on RHEL5
-	rpcinfo -u localhost 100005 1 > /dev/null || {
-		echo "ERROR: MOUNTD is not running. Trying to restart it."
-		RPCMOUNTDOPTS=""
-		[ -n "$MOUNTD_PORT" ] && RPCMOUNTDOPTS="$RPCMOUNTDOPTS -p $MOUNTD_PORT"
-		killall -q -9 rpc.mountd
-		rpc.mountd $RPCMOUNTDOPTS &
+	ctdb_check_rpc "MOUNTD" 100005 1 || {
+		p="rpc.mountd"
+		cmd="${p}${MOUNTD_PORT:+ -p }${MOUNTD_PORT}"
+		echo "Trying to restart MOUNTD [${cmd}]"
+		killall -q -9 $p
+		$cmd &
 		exit 1
 	}
 	# rquotad needs special handling since it is sometimes not started
 	# correctly on RHEL5
 	# this is not a critical service so we dont flag the node as unhealthy
-	rpcinfo -u localhost 100011 1 > /dev/null || {
-		echo "ERROR: RQUOTAD is not running. Trying to restart it."
-		RPCRQUOTADOPTS=""
-		[ -n "$RQUOTAD_PORT" ] && RPCRQUOTADOPTS="$RPCRQUOTADOPTS -p $RQUOTAD_PORT"
-		killall -q -9 rpc.rquotad
-		rpc.rquotad $RPCRQUOTADOPTS &
+	ctdb_check_rpc "RQUOTAD" 100011 1 || {
+		p="rpc.rquotad"
+		cmd="${p}${RQUOTAD_PORT:+ -p }${RQUOTAD_PORT}"
+		echo "Trying to restart RQUOTAD [${cmd}]"
+		killall -q -9 $p
+		$cmd &
 	}
 
 	# once every 60 seconds, update the statd state database for which
 	# clients need notifications
 	LAST_UPDATE=`stat --printf="%Y" $CTDB_VARDIR/state/statd/update-trigger`
 	CURRENT_TIME=`date +"%s"`
-	expr "$CURRENT_TIME" ">" "(" "$LAST_UPDATE" "+" "60" ")" >/dev/null 2>/dev/null
-	[ $? = "0" ] && {
+	[ $CURRENT_TIME -ge $(($LAST_UPDATE + 60)) ] && {
 	    mkdir -p $CTDB_VARDIR/state/statd
 	    touch $CTDB_VARDIR/state/statd/update-trigger
 	    $CTDB_BASE/statd-callout updatelocal &
diff --git a/config/functions b/config/functions
index 9659d48..610085b 100755
--- a/config/functions
+++ b/config/functions
@@ -4,7 +4,7 @@ PATH=/bin:/usr/bin:/usr/sbin:/sbin:$PATH
 
 #######################################
 # pull in a system config file, if any
-loadconfig() {
+_loadconfig() {
 
     if [ -z "$1" ] ; then
 	foo="${service_config:-${service_name}}"
@@ -25,6 +25,10 @@ loadconfig() {
     fi
 }
 
+loadconfig () {
+    _loadconfig "$@"
+}
+
 ##############################################################
 # determine on what type of system (init style) we are running
 detect_init_style() {
@@ -144,10 +148,14 @@ ctdb_check_rpc() {
     progname="$1"
     prognum="$2"
     version="$3"
-    rpcinfo -u localhost $prognum $version > /dev/null || {
-	    echo "ERROR: $progname not responding to rpc requests"
-	    exit 1
-    }
+
+    ctdb_check_rpc_out=$(rpcinfo -u localhost $prognum $version 2>&1)
+    if [ $? -ne 0 ] ; then
+	ctdb_check_rpc_out="ERROR: $progname failed RPC check:
+$ctdb_check_rpc_out"
+	echo "$ctdb_check_rpc_out"
+	return 1
+    fi
 }
 
 ######################################################
diff --git a/packaging/RPM/ctdb.spec.in b/packaging/RPM/ctdb.spec.in
index 8c61ec3..e4af617 100644
--- a/packaging/RPM/ctdb.spec.in
+++ b/packaging/RPM/ctdb.spec.in
@@ -3,7 +3,7 @@ Name: ctdb
 Summary: Clustered TDB
 Vendor: Samba Team
 Packager: Samba Team <samba at samba.org>
-Version: 1.2.9
+Version: 1.2.10
 Release: 1GITHASH
 Epoch: 0
 License: GNU GPL version 3
@@ -142,6 +142,14 @@ development libraries for ctdb
 %{_libdir}/libctdb.a
 
 %changelog
+* Mon Nov 15 2010 : Version 1.2.10
+ - Make sure to initialize the statistics start time to current time
+   instead of leaving it to point to start of epoch.
+   CQ : S1020838
+ - Create a new tunable DisableIPFailover that is used to tell ctdb
+   to not check any ip allocation at all and never do any failover
+   This can be used to stop/restart individual nodes without causing
+   any ip failovers to happen.
 * Tue Nov 2 2010 : Version 1.2.9
  - Drop loglevels on several items and remove spam from the messages file
  - Both nfs and nfslock can fail so restart both if there is a problem
diff --git a/server/ctdb_statistics.c b/server/ctdb_statistics.c
index aec17ad..29e6d6a 100644
--- a/server/ctdb_statistics.c
+++ b/server/ctdb_statistics.c
@@ -42,6 +42,7 @@ static void ctdb_statistics_update(struct event_context *ev, struct timed_event
 int ctdb_statistics_init(struct ctdb_context *ctdb)
 {
 	bzero(&ctdb->statistics, sizeof(struct ctdb_statistics));
+	ctdb->statistics.statistics_start_time = timeval_current();
 
 	bzero(&ctdb->statistics_current, sizeof(struct ctdb_statistics));
 	ctdb->statistics_current.statistics_start_time = timeval_current();
diff --git a/tests/simple/09_ctdb_ping.sh b/tests/simple/09_ctdb_ping.sh
index 6ca50d4..6610431 100755
--- a/tests/simple/09_ctdb_ping.sh
+++ b/tests/simple/09_ctdb_ping.sh
@@ -53,5 +53,5 @@ try_command_on_node -v 0 "! $CTDB ping -n 1"
 
 sanity_check_output \
     1 \
-    "(: ctdb_control error: 'ctdb_control to disconnected node'|Unable to get ping response from node 1|Node 1 is DISCONNECTED)" \
+    "(: ctdb_control error: ('ctdb_control to disconnected node'|'node is disconnected')|Unable to get ping response from node 1|Node 1 is DISCONNECTED|ctdb_control for getpnn failed|: Can not access node. Node is not operational\.)" \
     "$out"
diff --git a/tests/simple/14_ctdb_statistics.sh b/tests/simple/14_ctdb_statistics.sh
index 9a95a83..e9ecce5 100755
--- a/tests/simple/14_ctdb_statistics.sh
+++ b/tests/simple/14_ctdb_statistics.sh
@@ -33,7 +33,7 @@ set -e
 
 cluster_is_healthy
 
-pattern='^(CTDB version 1|Current time of statistics[[:space:]]*:.*|Statistics collected since[[:space:]]*:.*|Gathered statistics for [[:digit:]]+ nodes|[[:space:]]+[[:alpha:]_]+[[:space:]]+[[:digit:]]+|[[:space:]]+(node|client|timeouts)|[[:space:]]+([[:alpha:]_]+_latency|max_reclock_[[:alpha:]]+)[[:space:]]+[[:digit:]-]+\.[[:digit:]]+[[:space:]]sec)$'
+pattern='^(CTDB version 1|Current time of statistics[[:space:]]*:.*|Statistics collected since[[:space:]]*:.*|Gathered statistics for [[:digit:]]+ nodes|[[:space:]]+[[:alpha:]_]+[[:space:]]+[[:digit:]]+|[[:space:]]+(node|client|timeouts)|[[:space:]]+([[:alpha:]_]+_latency|max_reclock_[[:alpha:]]+)[[:space:]]+[[:digit:]-]+\.[[:digit:]]+[[:space:]]sec|[[:space:]]*(reclock_ctdbd|reclock_recd|call_latency|lockwait_latency|childwrite_latency)[[:space:]]+MIN/AVG/MAX[[:space:]]+[-.[:digit:]]+/[-.[:digit:]]+/[-.[:digit:]]+ sec out of [[:digit:]]+)$'
 
 try_command_on_node -v 1 "$CTDB statistics"
 


-- 
CTDB repository