[SCM] CTDB repository - branch 2.5 updated - ctdb-2.5.3-65-gb3df794
Amitay Isaacs
amitay at samba.org
Sun Jun 22 20:58:57 MDT 2014
The branch, 2.5 has been updated
via b3df79485915d692cf685c812067f55ebf0b5ea1 (commit)
via 4921fa00ac43d2766609e75f5cc9ac29d9c41a6b (commit)
via d4d60ede26b478e9ffd315b338c6ece005296a33 (commit)
from 7f3613c510c4549e381a78291ca87c76ece91710 (commit)
http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=2.5
- Log -----------------------------------------------------------------
commit b3df79485915d692cf685c812067f55ebf0b5ea1
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jun 16 10:59:20 2014 +1000
eventscripts: Ensure $GANRECDIR points to configured subdirectory
Check that the $GANRECDIR symlink points to the location specified by
$CTDB_GANESHA_REC_SUBDIR and replace it if incorrect. This handles
reconfiguration and filesystem changes.
While touching this code:
* Create the $GANRECDIR link as a separate step if it doesn't exist.
This means there is only 1 place where the link is created.
* Change some variables names to the style used for local function
variables.
* Remove some "ln failed" error messages. ln failures will be logged
anyway.
* Add -v to various mkdir/rm/ln commands so that these actions are
logged when they actually do something.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
Autobuild-Date(master): Fri Jun 20 05:40:16 CEST 2014 on sn-devel-104
(Imported from commit aac607d7271eb50e776423329f2446a1e33a2641)
commit 4921fa00ac43d2766609e75f5cc9ac29d9c41a6b
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Mar 5 16:21:45 2014 +1100
daemon: Debugging for tickle updates
This was useful for debugging the race fixed by commit
4f79fa6c7c843502fcdaa2dead534ea3719b9f69. It might be useful again.
Also fix a nearby comment typo.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
Autobuild-Date(master): Fri Jun 20 02:07:48 CEST 2014 on sn-devel-104
(Imported from commit 6f43896e1258c4cf43401cbfeba24a50de3c3140)
commit d4d60ede26b478e9ffd315b338c6ece005296a33
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jun 10 15:16:44 2014 +1000
tests: Try harder to avoid failures due to repeated recoveries
About a year ago a check was added to _cluster_is_healthy() to make
sure that node 0 isn't in recovery. This was to avoid unexpected
recoveries causing tests to fail. However, it was misguided because
each test initially calls cluster_is_healthy() and will now fail if an
unexpected recovery occurs.
Instead, have cluster_is_healthy() warn if the cluster is in recovery.
Also:
* Rename wait_until_healthy() to wait_until_ready() because it waits
until both healthy and out of recovery.
* Change the post-recovery sleep in restart_ctdb() to 2 seconds and
add a loop to wait (for 2 seconds at a time) if the cluster is back
in recovery. The logic here is that the re-recovery timeout has
been set to 1 second, so sleeping for just 1 second might race
against the next recovery.
* Use reverse logic in node_has_status() so that it works for "all".
* Tweak wait_until() so that it can handle timeouts with a
recheck-interval specified.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
(Imported from commit 6a552f1a12ebe43f946bbbee2a3846b5a640ae4f)
-----------------------------------------------------------------------
Summary of changes:
config/events.d/60.ganesha | 32 ++++++++++--------
server/ctdb_takeover.c | 11 +++++-
tests/complex/34_nfs_tickle_restart.sh | 2 +-
tests/scripts/integration.bash | 57 ++++++++++++++++++++++++--------
4 files changed, 72 insertions(+), 30 deletions(-)
Changeset truncated at 500 lines:
diff --git a/config/events.d/60.ganesha b/config/events.d/60.ganesha
index d348a4f..5640b74 100755
--- a/config/events.d/60.ganesha
+++ b/config/events.d/60.ganesha
@@ -84,25 +84,29 @@ create_ganesha_recdirs ()
{
[ -n "$CTDB_GANESHA_REC_SUBDIR" ] || CTDB_GANESHA_REC_SUBDIR=".ganesha"
- MOUNTS=$(mount -t $CTDB_CLUSTER_FILESYSTEM_TYPE)
- if [ -z "$MOUNTS" ]; then
+ _mounts=$(mount -t $CTDB_CLUSTER_FILESYSTEM_TYPE)
+ if [ -z "$_mounts" ]; then
echo "startup $CTDB_CLUSTER_FILESYSTEM_TYPE not ready"
exit 0
fi
- MNTPT=$(echo "$MOUNTS" | sort | awk 'NR == 1 {print $3}')
- mkdir -p $MNTPT/$CTDB_GANESHA_REC_SUBDIR
- if [ -e $GANRECDIR ]; then
- if [ ! -L $GANRECDIR ] ; then
- rm -rf $GANRECDIR
- if ! ln -s $MNTPT/$CTDB_GANESHA_REC_SUBDIR $GANRECDIR ; then
- echo "ln failed"
- fi
- fi
- else
- if ! ln -sf $MNTPT/$CTDB_GANESHA_REC_SUBDIR $GANRECDIR ; then
- echo "ln failed"
+ _mntpt=$(echo "$_mounts" | sort | awk 'NR == 1 {print $3}')
+ _link_dst="${_mntpt}/${CTDB_GANESHA_REC_SUBDIR}"
+ mkdir -vp "$_link_dst"
+ if [ -e "$GANRECDIR" ]; then
+ if [ ! -L "$GANRECDIR" ] ; then
+ rm -vrf "$GANRECDIR"
+ else
+ _t=$(readlink "$GANRECDIR")
+ if [ "$_t" != "$_link_dst" ] ; then
+ rm -v "$GANRECDIR"
+ fi
fi
fi
+ # This is not an "else". It also re-creates the link if it was
+ # removed above!
+ if [ ! -e "$GANRECDIR" ]; then
+ ln -sv "$_link_dst" "$GANRECDIR"
+ fi
mkdir -p $GANRECDIR2
mkdir -p $GANRECDIR3
diff --git a/server/ctdb_takeover.c b/server/ctdb_takeover.c
index f8a26f0..aaf243a 100644
--- a/server/ctdb_takeover.c
+++ b/server/ctdb_takeover.c
@@ -3234,7 +3234,7 @@ int32_t ctdb_control_tcp_remove(struct ctdb_context *ctdb, TDB_DATA indata)
/*
- Called when another daemon starts - caises all tickles for all
+ Called when another daemon starts - causes all tickles for all
public addresses we are serving to be sent to the new node on the
next check. This actually causes the next scheduled call to
tdb_update_tcp_tickles() to update all nodes. This is simple and
@@ -3244,6 +3244,9 @@ int32_t ctdb_control_startup(struct ctdb_context *ctdb, uint32_t pnn)
{
struct ctdb_vnn *vnn;
+ DEBUG(DEBUG_INFO, ("Received startup control from node %lu\n",
+ (unsigned long) pnn));
+
for (vnn = ctdb->vnn; vnn != NULL; vnn = vnn->next) {
vnn->tcp_update_needed = true;
}
@@ -3912,6 +3915,9 @@ int32_t ctdb_control_set_tcp_tickle_list(struct ctdb_context *ctdb, TDB_DATA ind
return -1;
}
+ DEBUG(DEBUG_INFO, ("Received tickle update for public address %s\n",
+ ctdb_addr_to_str(&list->addr)));
+
vnn = find_public_ip_vnn(ctdb, &list->addr);
if (vnn == NULL) {
DEBUG(DEBUG_INFO,(__location__ " Could not set tcp tickle list, '%s' is not a public address\n",
@@ -4060,6 +4066,9 @@ static void ctdb_update_tcp_tickles(struct event_context *ev,
DEBUG(DEBUG_ERR,("Failed to send the tickle update for public address %s\n",
ctdb_addr_to_str(&vnn->public_address)));
} else {
+ DEBUG(DEBUG_INFO,
+ ("Sent tickle update for public address %s\n",
+ ctdb_addr_to_str(&vnn->public_address)));
vnn->tcp_update_needed = false;
}
}
diff --git a/tests/complex/34_nfs_tickle_restart.sh b/tests/complex/34_nfs_tickle_restart.sh
index 93587e2..b7eea4c 100755
--- a/tests/complex/34_nfs_tickle_restart.sh
+++ b/tests/complex/34_nfs_tickle_restart.sh
@@ -79,7 +79,7 @@ try_command_on_node $rn $CTDB_TEST_WRAPPER restart_ctdb_1
echo "Setting NoIPTakeover on node ${rn}"
try_command_on_node $rn $CTDB setvar NoIPTakeover 1
-wait_until_healthy
+wait_until_ready
echo "Getting TickleUpdateInterval..."
try_command_on_node $test_node $CTDB getvar TickleUpdateInterval
diff --git a/tests/scripts/integration.bash b/tests/scripts/integration.bash
index 4a1f091..60f72b6 100644
--- a/tests/scripts/integration.bash
+++ b/tests/scripts/integration.bash
@@ -258,11 +258,19 @@ select_test_node_and_ips ()
#######################################
# Wait until either timeout expires or command succeeds. The command
-# will be tried once per second.
+# will be tried once per second, unless timeout has format T/I, where
+# I is the recheck interval.
wait_until ()
{
local timeout="$1" ; shift # "$@" is the command...
+ local interval=1
+ case "$timeout" in
+ */*)
+ interval="${timeout#*/}"
+ timeout="${timeout%/*}"
+ esac
+
local negate=false
if [ "$1" = "!" ] ; then
negate=true
@@ -280,9 +288,12 @@ wait_until ()
echo "OK"
return 0
fi
- echo -n .
- t=$(($t - 1))
- sleep 1
+ local i
+ for i in $(seq 1 $interval) ; do
+ echo -n .
+ done
+ t=$(($t - $interval))
+ sleep $interval
done
echo "*TIMEOUT*"
@@ -302,14 +313,26 @@ sleep_for ()
_cluster_is_healthy ()
{
- $CTDB nodestatus all >/dev/null && \
- node_has_status 0 recovered
+ $CTDB nodestatus all >/dev/null
+}
+
+_cluster_is_recovered ()
+{
+ node_has_status all recovered
+}
+
+_cluster_is_ready ()
+{
+ _cluster_is_healthy && _cluster_is_recovered
}
cluster_is_healthy ()
{
if onnode 0 $CTDB_TEST_WRAPPER _cluster_is_healthy ; then
echo "Cluster is HEALTHY"
+ if ! onnode 0 $CTDB_TEST_WRAPPER _cluster_is_recovered ; then
+ echo "WARNING: cluster in recovery mode!"
+ fi
return 0
else
echo "Cluster is UNHEALTHY"
@@ -325,13 +348,13 @@ cluster_is_healthy ()
fi
}
-wait_until_healthy ()
+wait_until_ready ()
{
local timeout="${1:-120}"
- echo "Waiting for cluster to become healthy..."
+ echo "Waiting for cluster to become ready..."
- wait_until $timeout onnode -q any $CTDB_TEST_WRAPPER _cluster_is_healthy
+ wait_until $timeout onnode -q any $CTDB_TEST_WRAPPER _cluster_is_ready
}
# This function is becoming nicely overloaded. Soon it will collapse! :-)
@@ -356,7 +379,7 @@ node_has_status ()
(unfrozen) fpat='^[[:space:]]+frozen[[:space:]]+0$' ;;
(monon) mpat='^Monitoring mode:ACTIVE \(0\)$' ;;
(monoff) mpat='^Monitoring mode:DISABLED \(1\)$' ;;
- (recovered) rpat='^Recovery mode:NORMAL \(0\)$' ;;
+ (recovered) rpat='^Recovery mode:RECOVERY \(1\)$' ;;
*)
echo "node_has_status: unknown status \"$status\""
return 1
@@ -382,7 +405,7 @@ node_has_status ()
elif [ -n "$mpat" ] ; then
$CTDB getmonmode -n "$pnn" | egrep -q "$mpat"
elif [ -n "$rpat" ] ; then
- $CTDB status -n "$pnn" | egrep -q "$rpat"
+ ! $CTDB status -n "$pnn" | egrep -q "$rpat"
else
echo 'node_has_status: unknown mode, neither $bits nor $fpat is set'
return 1
@@ -532,8 +555,8 @@ restart_ctdb ()
continue
}
- wait_until_healthy || {
- echo "Cluster didn't become healthy. Restarting..."
+ wait_until_ready || {
+ echo "Cluster didn't become ready. Restarting..."
continue
}
@@ -545,7 +568,13 @@ restart_ctdb ()
# help the cluster to stabilise before a subsequent test.
echo "Forcing a recovery..."
onnode -q 0 $CTDB recover
- sleep_for 1
+ sleep_for 2
+
+ if ! onnode -q any $CTDB_TEST_WRAPPER _cluster_is_recovered ; then
+ echo "Cluster has gone into recovery again, waiting..."
+ wait_until 30/2 onnode -q any $CTDB_TEST_WRAPPER _cluster_is_recovered
+ fi
+
# Cluster is still healthy. Good, we're done!
if ! onnode 0 $CTDB_TEST_WRAPPER _cluster_is_healthy ; then
--
CTDB repository
More information about the samba-cvs
mailing list