[SCM] CTDB repository - branch master updated - ctdb-2.3-20-g57aa2df

Mon Jul 29 00:01:56 MDT 2013

The branch, master has been updated
       via  57aa2dffea60abd73a95233f8b761cc676adebb6 (commit)
       via  37ccc7c6cc43a80aaa92291aea7a438f4225488a (commit)
       via  782814288bb560099ee44b607bf35f3eddf37f82 (commit)
       via  a20d94717d2e4ab866d8a002cdf39c0669b74c6a (commit)
       via  af5aa369c266430fe912df0c26116b68bac3572e (commit)
       via  a69e03a5e4671e998d45b4fef8611a421bbdb3e1 (commit)
       via  bf4a7c1ad87e0e848296d15d63eb8cd901ca5335 (commit)
       via  1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714 (commit)
       via  e0f3fa1020e13b84bdd672538168d148f1847d57 (commit)
       via  29e98017221326bdc9b1c4f7c05b3b495c1de29b (commit)
       via  9d6e1c147bd036d832b98c155f405ee2a5d6f57f (commit)
       via  ae3c03d80264e997b7da9f3279d7810e18b8a1df (commit)
       via  90d792cf28d6a823141e4c417b6978f02a9cf596 (commit)
       via  3dd5b925dcf0e9a5b877638e471c5ecf36b46c58 (commit)
       via  53e4eca74429f76adc81d98e3d11d1bd61194d71 (commit)
       via  501f19b16fd6d67fbb754248868c38ee5bcf79ef (commit)
       via  c6ab0f9405d5fa5b0b1693bc92e59da0d555a9d7 (commit)
       via  57ef5d3827ea3417a32703e259a53ce6fd10ac45 (commit)
      from  5740155cc5de1a223412e8529aa1a383a5412514 (commit)

http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 57aa2dffea60abd73a95233f8b761cc676adebb6
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 26 15:09:24 2013 +1000

    doc: Update XML files to use standard DocBook DTD
    
    This simplifies building since we don't use any of the Samba
    extensions.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 37ccc7c6cc43a80aaa92291aea7a438f4225488a
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 26 11:20:47 2013 +1000

    initscript: The wrapper script should export CTDB_SOCKET
    
    This ensures that any invocation of the ctdb tool (within the wrapper)
    gets the desired value.  This at least ensures that ctdbd will be
    started.
    
    If a non-standard value is set for CTDB_SOCKET then command-line users
    will still need the variable in their environment.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>

commit 782814288bb560099ee44b607bf35f3eddf37f82
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 25 16:17:07 2013 +1000

    ctdbd: Kill client process without checking for tracked child
    
    Commit f73a4b1495830bcdd094a93732a89dd53b3c2f78 added a safety check
    to ensure that CTDB never kills unrelated processes.  However, client
    processes are unrelated.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 25 13:40:43 2013 +1000

    eventscripts: kill_tcp_connections() should send connections to stdin
    
    This avoids issuing multiple "ctdb killtcp" commands to terminate tcp
    connections, one per connection.  This will considerably reduce the
    time when there is a large number of tcp connections.  This also makes
    it possible to avoid calling "ctdb killtcp" when there are no connections.
    
    Add a couple of unit tests for killtcp and update eventscript unit
    test infrastructure to support.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>

commit af5aa369c266430fe912df0c26116b68bac3572e
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jul 25 13:28:26 2013 +1000

    tools/ctdb: Allow killtcp to read connections from standard input
    
    This will allows eventscripts to send information about multiple tcp
    connections to a single "ctdb killtcp" command, saving the overhead of
    setting up a client connection per tcp connection.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>

commit a69e03a5e4671e998d45b4fef8611a421bbdb3e1
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jul 22 20:11:58 2013 +1000

    tests: Always tally the number of passed/failed tests
    
    Regardless of whether a summary is being printed!
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit bf4a7c1ad87e0e848296d15d63eb8cd901ca5335
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jul 22 16:39:46 2013 +1000

    recoverd: Call takeover fail callback only once per node
    
    Currently the fail callback is called once per (takeip/releaseip) control
    failure.  This is overkill and can get a node banned much too quickly.
    
    Instead, keep track of control failures per node and only call fail
    callback once per failed node.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>

commit 1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jul 22 15:08:32 2013 +1000

    scripts: Run scriptstatus for hung event
    
    The timeout information printed by ctdbd is less than useful because
    it refers to the cumulative time taken by the eventscripts run so far.
    Adding scriptstatus output indicates where time was actually spent.
    
    Since there is now quite a bit of output, serialise the calls to this
    script using flock.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>

commit e0f3fa1020e13b84bdd672538168d148f1847d57
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jul 22 15:06:52 2013 +1000

    ctdbd: Pass event name to hung script debugger
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>

commit 29e98017221326bdc9b1c4f7c05b3b495c1de29b
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jul 22 14:32:13 2013 +1000

    tests/complex: Fix NFS tests to work with root_squash
    
    Refactor the NFS test setup/cleanup code into new common functions.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>

commit 9d6e1c147bd036d832b98c155f405ee2a5d6f57f
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 19 19:59:43 2013 +1000

    tests: Fix exit status of run_tests when a single test is run with -H
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit ae3c03d80264e997b7da9f3279d7810e18b8a1df
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 19 15:33:38 2013 +1000

    tests/simple: Add -p in onnode test to help show groups of connections
    
    Change the command from "true" to "hostname" since the former won't
    produce any output when used in combination with "onnode -p".  This
    could just be changed to "echo" but the hostname might actually be
    useful.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 90d792cf28d6a823141e4c417b6978f02a9cf596
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Jul 17 11:14:37 2013 +1000

    ctdbd: Sleep at exit to allow time for log messages to flush
    
    Register print_exit_message() earlier so that it covers most of the
    early exits.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>

commit 3dd5b925dcf0e9a5b877638e471c5ecf36b46c58
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 19 15:36:29 2013 +1000

    ctdbd: Exit if something is already listening on CTDB socket
    
    Don't blindly remove the socket.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 53e4eca74429f76adc81d98e3d11d1bd61194d71
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jul 16 19:57:18 2013 +1000

    tests/eventscripts: Add tests for monitoring of missing interfaces
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 501f19b16fd6d67fbb754248868c38ee5bcf79ef
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 12 12:48:34 2013 +1000

    eventscripts: A missing interface should cause monitoring to fail
    
    A missing interface is at least as bad as an interface with a link
    that is down so should have a similar effect.
    
    This couldn't be done previously because orphaned interfaces used to
    be listed for monitoring.  This was worked around in 10.interface in
    commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443 and fixed in ctdbd in
    commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.
    
    If $CTDB_PARTIALLY_ONLINE_INTERFACES="yes" then monitoring won't
    actually fail but the interface is still marked as down.
    
    While we're touching this code, use "ip link" instead of "ip addr".
    It is marginally cheaper but not enough for a separate patch.  ;-)
    
    This effectively reverts d67955b42f7627be9dae995230c8fcbb8a948ec2.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit c6ab0f9405d5fa5b0b1693bc92e59da0d555a9d7
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 12 12:33:36 2013 +1000

    eventscripts: Get list of configured interfaces using "ctdb ifaces"
    
    This was previosuly changed because ctdbd didn't garbage collect
    orphaned interfaces.  This was fixed in commit
    cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 57ef5d3827ea3417a32703e259a53ce6fd10ac45
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jun 24 15:49:48 2013 +1000

    ctdbd: Allow extra recovery to repair persistent DBs during first recovery
    
    Commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28 introduced a potential
    regression because a node may not have completed the "recovered" event
    (so might still be in CTDB_RUNSTATE_FIRST_RECOVERY) when another node
    becomes healthy.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

-----------------------------------------------------------------------

Summary of changes:
 config/ctdbd_wrapper                             |    2 +
 config/debug-hung-script.sh                      |   19 +++-
 config/events.d/10.interface                     |   11 +-
 config/functions                                 |   25 +++--
 doc/ctdb.1.xml                                   |   14 ++-
 doc/ctdbd.1.xml                                  |    4 +-
 doc/ltdbtool.1.xml                               |    4 +-
 doc/onnode.1.xml                                 |    4 +-
 doc/ping_pong.1.xml                              |    4 +-
 server/ctdb_daemon.c                             |   40 +++++---
 server/ctdb_monitor.c                            |    2 +-
 server/ctdb_takeover.c                           |   51 ++++++++-
 server/eventscript.c                             |    5 +-
 tests/complex/43_failover_nfs_basic.sh           |   19 +---
 tests/complex/44_failover_nfs_oneway.sh          |   25 +----
 tests/complex/45_failover_nfs_kill.sh            |   19 +---
 tests/eventscripts/10.interface.monitor.015.sh   |   16 +++
 tests/eventscripts/10.interface.monitor.016.sh   |   18 +++
 tests/eventscripts/10.interface.releaseip.010.sh |   32 ++++++
 tests/eventscripts/10.interface.releaseip.011.sh |   38 +++++++
 tests/eventscripts/scripts/local.sh              |    5 +
 tests/eventscripts/stubs/ctdb                    |    8 ++
 tests/eventscripts/stubs/ip                      |  126 ++++++++++++---------
 tests/eventscripts/stubs/netstat                 |    3 +
 tests/scripts/integration.bash                   |   36 ++++++
 tests/scripts/run_tests                          |   19 ++--
 tests/simple/00_ctdb_onnode.sh                   |    2 +-
 tools/ctdb.c                                     |  112 +++++++++++++++++++-
 28 files changed, 504 insertions(+), 159 deletions(-)
 mode change 100644 => 100755 config/debug-hung-script.sh
 create mode 100755 tests/eventscripts/10.interface.monitor.015.sh
 create mode 100755 tests/eventscripts/10.interface.monitor.016.sh
 create mode 100755 tests/eventscripts/10.interface.releaseip.010.sh
 create mode 100755 tests/eventscripts/10.interface.releaseip.011.sh


Changeset truncated at 500 lines:

diff --git a/config/ctdbd_wrapper b/config/ctdbd_wrapper
index 33bef06..fbc76cf 100755
--- a/config/ctdbd_wrapper
+++ b/config/ctdbd_wrapper
@@ -20,6 +20,8 @@ action="$2"
 . "${CTDB_BASE}/functions"
 loadconfig "ctdb"
 
+export CTDB_SOCKET
+
 ctdbd="${CTDBD:-/usr/sbin/ctdbd}"
 
 ############################################################
diff --git a/config/debug-hung-script.sh b/config/debug-hung-script.sh
old mode 100644
new mode 100755
index dcf68ba..32dbd5f
--- a/config/debug-hung-script.sh
+++ b/config/debug-hung-script.sh
@@ -1,4 +1,19 @@
 #!/bin/sh
 
-echo "Pstree output for the hung script:"
-pstree -p -a $1
+(
+    flock --wait 2 9 || exit 1
+
+    echo "===== Start of hung script debug for PID=\"$1\", event\"$2\" ====="
+
+    echo "pstree -p -a ${1}:"
+    pstree -p -a $1
+
+    echo "ctdb scriptstatus ${2}:"
+    # No use running several of these in parallel if, say, "releaseip"
+    # event hangs for multiple IPs.  In that case the output would be
+    # interleaved in the log and would just be confusing.
+    ctdb scriptstatus "$2"
+
+    echo "===== End of hung script debug for PID=\"$1\", event\"$2\" ====="
+
+) 9>"${CTDB_VARDIR}/debug-hung-script.lock"
diff --git a/config/events.d/10.interface b/config/events.d/10.interface
index caff9fa..5379ea8 100755
--- a/config/events.d/10.interface
+++ b/config/events.d/10.interface
@@ -44,9 +44,9 @@ get_all_interfaces ()
     [ "$CTDB_PUBLIC_INTERFACE" ] && all_interfaces="$CTDB_PUBLIC_INTERFACE $all_interfaces"
     [ "$CTDB_NATGW_PUBLIC_IFACE" ] && all_interfaces="$CTDB_NATGW_PUBLIC_IFACE $all_interfaces"
 
-    # Get the configured interfaces for each IP.  That is, for all but
-    # the 1st line, get the last field with commas changed to spaces.
-    ctdb_ifaces=$(ctdb -Y ip -v | sed -e '1d' -e 's/:$//' -e 's/^.*://' -e 's/,/ /g')
+    # Get the interfaces for which CTDB has public IPs configured.
+    # That is, for all but the 1st line, get the 1st field.
+    ctdb_ifaces=$(ctdb -Y ifaces | sed -e '1d' -e 's@^:@@' -e 's@:.*@@')
 
     # Add $ctdb_interfaces and uniquify
     all_interfaces=$(echo $all_interfaces $ctdb_ifaces | tr ' ' '\n' | sort -u)
@@ -65,8 +65,9 @@ monitor_interfaces()
 	# problem with an interface then set fail=true and continue.
 	for iface in $all_interfaces ; do
 
-	    ip addr show $iface 2>/dev/null >/dev/null || {
-		echo "WARNING: Interface $iface does not exist but it is used by public addresses."
+	    ip link show $iface 2>/dev/null >/dev/null || {
+		echo "ERROR: Interface $iface does not exist but it is used by public addresses."
+		mark_down $iface
 		continue
 	    }
 
diff --git a/config/functions b/config/functions
index 0679938..eabc940 100755
--- a/config/functions
+++ b/config/functions
@@ -648,32 +648,35 @@ kill_tcp_connections ()
 
     get_tcp_connections_for_ip "$_ip" | {
 	_killcount=0
-	_failed=false
-
-	while read dest src; do
-	    echo "Killing TCP connection $src $dest"
-	    ctdb killtcp $src $dest >/dev/null 2>&1 || _failed=true
-	    _destport="${dest##*:}"
+	_connections=""
+	_nl="
+"
+	while read _dst _src; do
+	    _destport="${_dst##*:}"
 	    __oneway=$_oneway
 	    case $_destport in
 		# we only do one-way killtcp for CIFS
 		139|445) __oneway=true ;;
 	    esac
+
+	    echo "Killing TCP connection $_src $_dst"
+	    _connections="${_connections}${_nl}${_src} ${_dst}"
 	    if ! $__oneway ; then
-		ctdb killtcp $dest $src >/dev/null 2>&1 || _failed=true
+		_connections="${_connections}${_nl}${_dst} ${_src}"
 	    fi
 
 	    _killcount=$(($_killcount + 1))
 	done
 
-	if $_failed ; then
-	    echo "Failed to send killtcp control"
-	    return
-	fi
 	if [ $_killcount -eq 0 ] ; then
 	    return
 	fi
 
+	echo "$_connections" | ctdb killtcp || {
+	    echo "Failed to send killtcp control"
+	    return
+	}
+
 	_count=0
 	while : ; do
 	    if [ -z "$(get_tcp_connections_for_ip $_ip)" ] ; then
diff --git a/doc/ctdb.1.xml b/doc/ctdb.1.xml
index c854619..ebb9c8e 100644
--- a/doc/ctdb.1.xml
+++ b/doc/ctdb.1.xml
@@ -1,5 +1,7 @@
 <?xml version="1.0" encoding="iso-8859-1"?>
-<!DOCTYPE refentry PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
+<!DOCTYPE refentry
+	PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+	"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
 <refentry id="ctdb.1">
 
 <refmeta>
@@ -1014,11 +1016,13 @@ Reclock file:/gpfs/.ctdb/shared
       </para>
     </refsect2>
 
-    <refsect2><title>killtcp <srcip:port> <dstip:port></title>
+    <refsect2><title>killtcp [<srcip:port> <dstip:port>]</title>
       <para>
-        This command will kill the specified TCP connection by issuing a
-        TCP RST to the srcip:port endpoint. This is a command used by the 
-	ctdb eventscripts.
+        This command will kill the specified TCP connections by
+        issuing a TCP RST to the srcip:port endpoint.  A single
+        connection can be specified on the command-line, otherwise
+        connections are read one-per-line from standard input.  This
+        is a command used by the ctdb eventscripts.
       </para>
     </refsect2>
 
diff --git a/doc/ctdbd.1.xml b/doc/ctdbd.1.xml
index 1053d9b..111a8f4 100644
--- a/doc/ctdbd.1.xml
+++ b/doc/ctdbd.1.xml
@@ -1,5 +1,7 @@
 <?xml version="1.0" encoding="iso-8859-1"?>
-<!DOCTYPE refentry PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
+<!DOCTYPE refentry
+	PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+	"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
 <refentry id="ctdbd.1">
 
 <refmeta>
diff --git a/doc/ltdbtool.1.xml b/doc/ltdbtool.1.xml
index a0379a6..fe9e3e8 100644
--- a/doc/ltdbtool.1.xml
+++ b/doc/ltdbtool.1.xml
@@ -1,5 +1,7 @@
 <?xml version="1.0" encoding="iso-8859-1"?>
-<!DOCTYPE refentry PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
+<!DOCTYPE refentry
+	PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+	"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
 <refentry id="ltdbtool.1">
 
 <refmeta>
diff --git a/doc/onnode.1.xml b/doc/onnode.1.xml
index 1b97c2f..65b1792 100644
--- a/doc/onnode.1.xml
+++ b/doc/onnode.1.xml
@@ -1,5 +1,7 @@
 <?xml version="1.0" encoding="iso-8859-1"?>
-<!DOCTYPE refentry PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
+<!DOCTYPE refentry
+	PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+	"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
 <refentry id="onnode.1">
 
 <refmeta>
diff --git a/doc/ping_pong.1.xml b/doc/ping_pong.1.xml
index f4148ae..2e4b016 100644
--- a/doc/ping_pong.1.xml
+++ b/doc/ping_pong.1.xml
@@ -1,5 +1,7 @@
 <?xml version="1.0" encoding="iso-8859-1"?>
-<!DOCTYPE refentry PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
+<!DOCTYPE refentry
+	PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+	"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
 <refentry id="ping_pong.1">
 
 <refmeta>
diff --git a/server/ctdb_daemon.c b/server/ctdb_daemon.c
index 0932157..644b5ed 100644
--- a/server/ctdb_daemon.c
+++ b/server/ctdb_daemon.c
@@ -47,6 +47,9 @@ static void print_exit_message(void)
 		DEBUG(DEBUG_NOTICE,("CTDB %s shutting down\n", debug_extra));
 	} else {
 		DEBUG(DEBUG_NOTICE,("CTDB daemon shutting down\n"));
+
+		/* Wait a second to allow pending log messages to be flushed */
+		sleep(1);
 	}
 }
 
@@ -976,23 +979,35 @@ static int ux_socket_bind(struct ctdb_context *ctdb)
 		return -1;
 	}
 
-	set_close_on_exec(ctdb->daemon.sd);
-	set_nonblocking(ctdb->daemon.sd);
-
 	memset(&addr, 0, sizeof(addr));
 	addr.sun_family = AF_UNIX;
 	strncpy(addr.sun_path, ctdb->daemon.name, sizeof(addr.sun_path));
 
+	/* First check if an old ctdbd might be running */
+	if (connect(ctdb->daemon.sd,
+		    (struct sockaddr *)&addr, sizeof(addr)) == 0) {
+		DEBUG(DEBUG_CRIT,
+		      ("Something is already listening on ctdb socket '%s'\n",
+		       ctdb->daemon.name));
+		goto failed;
+	}
+
+	/* Remove any old socket */
+	unlink(ctdb->daemon.name);
+
+	set_close_on_exec(ctdb->daemon.sd);
+	set_nonblocking(ctdb->daemon.sd);
+
 	if (bind(ctdb->daemon.sd, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
 		DEBUG(DEBUG_CRIT,("Unable to bind on ctdb socket '%s'\n", ctdb->daemon.name));
 		goto failed;
-	}	
+	}
 
 	if (chown(ctdb->daemon.name, geteuid(), getegid()) != 0 ||
 	    chmod(ctdb->daemon.name, 0700) != 0) {
 		DEBUG(DEBUG_CRIT,("Unable to secure ctdb socket '%s', ctdb->daemon.name\n", ctdb->daemon.name));
 		goto failed;
-	} 
+	}
 
 
 	if (listen(ctdb->daemon.sd, 100) != 0) {
@@ -1139,13 +1154,10 @@ int ctdb_start_daemon(struct ctdb_context *ctdb, bool do_fork, bool use_syslog,
 	struct fd_event *fde;
 	const char *domain_socket_name;
 
-	/* get rid of any old sockets */
-	unlink(ctdb->daemon.name);
-
 	/* create a unix domain stream socket to listen to */
 	res = ux_socket_bind(ctdb);
 	if (res!=0) {
-		DEBUG(DEBUG_ALERT,(__location__ " Failed to open CTDB unix domain socket\n"));
+		DEBUG(DEBUG_ALERT,("Cannot continue.  Exiting!\n"));
 		exit(10);
 	}
 
@@ -1171,6 +1183,12 @@ int ctdb_start_daemon(struct ctdb_context *ctdb, bool do_fork, bool use_syslog,
 			  CTDB_VERSION_STRING, ctdbd_pid));
 	ctdb_create_pidfile(ctdb->ctdbd_pid);
 
+	/* Make sure we log something when the daemon terminates.
+	 * This must be the first exit handler to run (so the last to
+	 * be registered.
+	 */
+	atexit(print_exit_message);
+
 	if (ctdb->do_setsched) {
 		/* try to set us up as realtime */
 		ctdb_set_scheduler(ctdb);
@@ -1283,10 +1301,6 @@ int ctdb_start_daemon(struct ctdb_context *ctdb, bool do_fork, bool use_syslog,
 		ctdb_release_all_ips(ctdb);
 	}
 
-
-	/* Make sure we log something when the daemon terminates */
-	atexit(print_exit_message);
-
 	/* Start the transport */
 	if (ctdb->methods->start(ctdb) != 0) {
 		DEBUG(DEBUG_ALERT,("transport failed to start!\n"));
diff --git a/server/ctdb_monitor.c b/server/ctdb_monitor.c
index 63eb9df..c23477d 100644
--- a/server/ctdb_monitor.c
+++ b/server/ctdb_monitor.c
@@ -480,7 +480,7 @@ int32_t ctdb_control_modflags(struct ctdb_context *ctdb, TDB_DATA indata)
 
 	DEBUG(DEBUG_INFO, ("Control modflags on node %u - flags now 0x%x\n", c->pnn, node->flags));
 
-	if (node->flags == 0 && ctdb->runstate == CTDB_RUNSTATE_STARTUP) {
+	if (node->flags == 0 && ctdb->runstate <= CTDB_RUNSTATE_STARTUP) {
 		DEBUG(DEBUG_ERR, (__location__ " Node %u became healthy - force recovery for startup\n",
 				  c->pnn));
 		ctdb->recovery_mode = CTDB_RECOVERY_ACTIVE;
diff --git a/server/ctdb_takeover.c b/server/ctdb_takeover.c
index be49b3f..82fecfc 100644
--- a/server/ctdb_takeover.c
+++ b/server/ctdb_takeover.c
@@ -861,7 +861,7 @@ static void release_kill_clients(struct ctdb_context *ctdb, ctdb_sock_addr *addr
 					(unsigned)client->pid,
 					ctdb_addr_to_str(addr),
 					ip->client_id));
-				ctdb_kill(ctdb, client->pid, SIGKILL);
+				kill(client->pid, SIGKILL);
 			}
 		}
 	}
@@ -2622,6 +2622,40 @@ static void iprealloc_fail_callback(struct ctdb_context *ctdb, uint32_t pnn,
 	}
 }
 
+struct takeover_callback_data {
+	bool *node_failed;
+	client_async_callback fail_callback;
+	void *fail_callback_data;
+	struct ctdb_node_map *nodemap;
+};
+
+static void takeover_run_fail_callback(struct ctdb_context *ctdb,
+				       uint32_t node_pnn, int32_t res,
+				       TDB_DATA outdata, void *callback_data)
+{
+	struct takeover_callback_data *cd =
+		talloc_get_type_abort(callback_data,
+				      struct takeover_callback_data);
+	int i;
+
+	for (i = 0; i < cd->nodemap->num; i++) {
+		if (node_pnn == cd->nodemap->nodes[i].pnn) {
+			break;
+		}
+	}
+
+	if (i == cd->nodemap->num) {
+		DEBUG(DEBUG_ERR, (__location__ " invalid PNN %u\n", node_pnn));
+		return;
+	}
+
+	if (!cd->node_failed[i]) {
+		cd->node_failed[i] = true;
+		cd->fail_callback(ctdb, node_pnn, res, outdata,
+				  cd->fail_callback_data);
+	}
+}
+
 /*
   make any IP alias changes for public addresses that are necessary 
  */
@@ -2640,6 +2674,7 @@ int ctdb_takeover_run(struct ctdb_context *ctdb, struct ctdb_node_map *nodemap,
 	TALLOC_CTX *tmp_ctx = talloc_new(ctdb);
 	uint32_t disable_timeout;
 	struct ctdb_ipflags *ipflags;
+	struct takeover_callback_data *takeover_data;
 	struct iprealloc_callback_data iprealloc_data;
 	bool *retry_data;
 
@@ -2683,11 +2718,21 @@ int ctdb_takeover_run(struct ctdb_context *ctdb, struct ctdb_node_map *nodemap,
 	/* now tell all nodes to delete any alias that they should not
 	   have.  This will be a NOOP on nodes that don't currently
 	   hold the given alias */
+	takeover_data = talloc_zero(tmp_ctx, struct takeover_callback_data);
+	CTDB_NO_MEMORY_FATAL(ctdb, takeover_data);
+
+	takeover_data->node_failed = talloc_zero_array(tmp_ctx,
+						       bool, nodemap->num);
+	CTDB_NO_MEMORY_FATAL(ctdb, takeover_data->node_failed);
+	takeover_data->fail_callback = fail_callback;
+	takeover_data->fail_callback_data = callback_data;
+	takeover_data->nodemap = nodemap;
+
 	async_data = talloc_zero(tmp_ctx, struct client_async_data);
 	CTDB_NO_MEMORY_FATAL(ctdb, async_data);
 
-	async_data->fail_callback = fail_callback;
-	async_data->callback_data = callback_data;
+	async_data->fail_callback = takeover_run_fail_callback;
+	async_data->callback_data = takeover_data;
 
 	for (i=0;i<nodemap->num;i++) {
 		/* don't talk to unconnected nodes, but do talk to banned nodes */
diff --git a/server/eventscript.c b/server/eventscript.c
index 10d426f..c255e17 100644
--- a/server/eventscript.c
+++ b/server/eventscript.c
@@ -548,8 +548,9 @@ static void ctdb_run_debug_hung_script(struct ctdb_context *ctdb, struct ctdb_ev
 			debug_hung_script = getenv("CTDB_DEBUG_HUNG_SCRIPT");
 		}
 
-		buf = talloc_asprintf(NULL, "%s %d",
-				      debug_hung_script, state->child);
+		buf = talloc_asprintf(NULL, "%s %d %s",
+				      debug_hung_script, state->child,
+				      ctdb_eventscript_call_names[state->call]);
 		system(buf);
 		talloc_free(buf);
 
diff --git a/tests/complex/43_failover_nfs_basic.sh b/tests/complex/43_failover_nfs_basic.sh
index 71a8229..a68f7db 100755
--- a/tests/complex/43_failover_nfs_basic.sh
+++ b/tests/complex/43_failover_nfs_basic.sh
@@ -49,22 +49,11 @@ cluster_is_healthy
 # Reset configuration
 ctdb_restart_when_done
 
-select_test_node_and_ips
-
-first_export=$(showmount -e $test_ip | sed -n -e '2s/ .*//p')
-mnt_d=$(mktemp -d)
-test_file="${mnt_d}/$RANDOM"
-
-ctdb_test_exit_hook_add rm -f "$test_file"
-ctdb_test_exit_hook_add umount -f "$mnt_d"
-ctdb_test_exit_hook_add rmdir "$mnt_d"
-
-echo "Mounting ${test_ip}:${first_export} on ${mnt_d} ..."
-mount -o timeo=1,hard,intr,vers=3 ${test_ip}:${first_export} ${mnt_d}
+nfs_test_setup
 
 echo "Create file containing random data..."
-dd if=/dev/urandom of=$test_file bs=1k count=1
-original_sum=$(sum $test_file)
+dd if=/dev/urandom of=$nfs_local_file bs=1k count=1
+original_sum=$(sum $nfs_local_file)
 [ $? -eq 0 ]
 
 gratarp_sniff_start
@@ -75,7 +64,7 @@ wait_until_node_has_status $test_node disabled
 
 gratarp_sniff_wait_show
 
-new_sum=$(sum $test_file)
+new_sum=$(sum $nfs_local_file)
 [ $? -eq 0 ]
 
 if [ "$original_md5" = "$new_md5" ] ; then
diff --git a/tests/complex/44_failover_nfs_oneway.sh b/tests/complex/44_failover_nfs_oneway.sh
index 7da8d01..aaec2ed 100755
--- a/tests/complex/44_failover_nfs_oneway.sh
+++ b/tests/complex/44_failover_nfs_oneway.sh
@@ -51,31 +51,18 @@ cluster_is_healthy
 # Reset configuration
 ctdb_restart_when_done
 
-select_test_node_and_ips
+nfs_test_setup
 
-first_export=$(showmount -e $test_ip | sed -n -e '2s/ .*//p')
+echo "Create file containing random data..."
 local_f=$(mktemp)
-mnt_d=$(mktemp -d)
-nfs_f="${mnt_d}/$RANDOM"
-remote_f="${test_ip}:${first_export}/$(basename $nfs_f)"
-
 ctdb_test_exit_hook_add rm -f "$local_f"
-ctdb_test_exit_hook_add rm -f "$nfs_f"
-ctdb_test_exit_hook_add umount -f "$mnt_d"
-ctdb_test_exit_hook_add rmdir "$mnt_d"
-
-echo "Create file containing random data..."
 dd if=/dev/urandom of=$local_f bs=1k count=1
-chmod 644 "$local_f" # needed for *_squash?
 local_sum=$(sum $local_f)
-[ $? -eq 0 ]
-
-scp -p "$local_f" "$remote_f"
 
-echo "Mounting ${test_ip}:${first_export} on ${mnt_d} ..."
-mount -o timeo=1,hard,intr,vers=3 ${test_ip}:${first_export} ${mnt_d}
+scp -p "$local_f" "${test_ip}:${nfs_remote_file}"
+try_command_on_node $test_node "chmod 644 $nfs_remote_file"
 
-nfs_sum=$(sum $nfs_f)
+nfs_sum=$(sum $nfs_local_file)
 
 if [ "$local_sum" = "$nfs_sum" ] ; then
     echo "GOOD: file contents read correctly via NFS"
@@ -94,7 +81,7 @@ wait_until_node_has_status $test_node disabled
 
 gratarp_sniff_wait_show
 
-new_sum=$(sum $nfs_f)
+new_sum=$(sum $nfs_local_file)
 [ $? -eq 0 ]


-- 
CTDB repository