[SCM] CTDB repository - branch 2.5 updated - ctdb-2.5.4-19-gdfed4f3

Thu Jan 29 23:23:02 MST 2015

The branch, 2.5 has been updated
       via  dfed4f36662df4b0fccb45ef390eea36382a3738 (commit)
       via  e35ded87829ec6858fb5e045bc428c3f7454ee45 (commit)
       via  567359a8d8c6b355054eb1132c765516f5cf7249 (commit)
       via  8c168b37d2fec274bace439e504e1d32b4a3357a (commit)
       via  952040ace9cc34dcaab96f238341cd42eb6dc4f0 (commit)
       via  36f3f2c2f2e40ce9df69907a71c19940df9a5864 (commit)
       via  3f46d376f3019ed579951be474f11ac5e1744ea1 (commit)
       via  64ccd71ba19a2cb7d0bc5a7259d80d0520ab69d0 (commit)
       via  bd03bb7370edea2d4d74ce3f91eb109acf776d8f (commit)
       via  1eb332804e66b0a9d57045e1e6f15a22eb89425e (commit)
       via  08763a59fc56eba28dcb652f1fc5ba97bef42647 (commit)
       via  d59ebfb00a44b23400a3ecc602ab4542af06018f (commit)
       via  9c125995fec927b49ae228d2e94ffb69f32f2b69 (commit)
       via  a5d07817c00efcbd434f2e10696a8fdbbab641c9 (commit)
       via  d532f63178c17bc5faea3e688b9d2e026a617b9d (commit)
       via  6796f0d6c95755c3270ef3deea6da10c8d8473f7 (commit)
       via  08832d8b2398f4f3af73e781c805feeaffdc0469 (commit)
       via  eb37e2108d10257dfafe2bc7a719690dd2d466d5 (commit)
       via  645f15c98d572b703cecabcc2af2abb05e9b6e67 (commit)
      from  70c7ef023730d8344ca4afde2c94634dd541101f (commit)

https://git.samba.org/?p=ctdb.git;a=shortlog;h=2.5


- Log -----------------------------------------------------------------
commit dfed4f36662df4b0fccb45ef390eea36382a3738
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Aug 8 11:42:51 2014 +1000

    logging: Rename ctdb_log_handler() to ctdb_child_log_handler()
    
    Now it is obvious that it has something to do with child processes.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    (Imported from commit 7d391b746695d7a262e4f939f057ee1d1685e12b)

commit e35ded87829ec6858fb5e045bc428c3f7454ee45
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Oct 8 14:22:53 2014 +1100

    logging: Remove debug levels DEBUG_ALERT and DEBUG_CRIT
    
    Internally map them to DEBUG_ERR to limit code churn.
    
    This reduces the unwieldy number of debug levels used by CTDB.  ALERT
    and CRIT aren't of much use as separate errors, since everything from
    ERR up should always be logged.  In future just ERR can be used.
    
    This also improves compatibility with Samba's debug.c system priority
    mapping.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    (Imported from commit f4fc9a153c533968905b8c7945c6615dcd9253d1)

commit 567359a8d8c6b355054eb1132c765516f5cf7249
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Oct 8 14:19:22 2014 +1100

    logging: Remove DEBUG_EMERG
    
    It isn't used and shouldn't be.  CTDB can't make the system unusable.
    
    Update associated test to ensure that EMERG isn't attempted.  Actually
    test all remaining debug levels and modernise the test a bit.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    (Imported from commit 0eabbb8c2b91b61a23f20e04605fdbd653c5cbcb)

commit 8c168b37d2fec274bace439e504e1d32b4a3357a
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Tue Oct 14 17:52:55 2014 +1100

    tools: Fix heap-use-after-free problem
    
    Found by address sanitizer.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    Reviewed-by: Martin Schwenke <martin at meltin.net>
    
    Autobuild-User(master): Martin Schwenke <martins at samba.org>
    Autobuild-Date(master): Fri Oct 17 12:56:02 CEST 2014 on sn-devel-104
    
    (Imported from commit 470af881479d1a1588dc23ef40622b4d8f006b61)

commit 952040ace9cc34dcaab96f238341cd42eb6dc4f0
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Wed Apr 23 18:02:39 2014 +1000

    recoverd: Process all the records for vacuum fetch in a loop
    
    Processing one migration request at a time is very slow and processing
    a batch of records can take longer than VacuumInterval.  This causes
    subsequent vacuum fetch requests to be dropped.  The dropped records
    can accumulate quickly and will cause the vacuum database traverse to
    be quite expensive.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    Reviewed-by: Martin Schwenke <martin at meltin.net>
    
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Fri Dec  5 17:06:58 CET 2014 on sn-devel-104
    
    (Imported from commit 959b9ea0ef85c57ffc84d66a6e5e855868943391)

commit 36f3f2c2f2e40ce9df69907a71c19940df9a5864
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Apr 14 14:53:25 2014 +1000

    vacuum: Do not delete VACUUM MIGRATED records immediately
    
    Such records should be processed by the local vacuuming daemon to ensure
    that all the remote copies have been deleted first.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    Reviewed-by: Martin Schwenke <martin at meltin.net>
    
    (Imported from commit 257311e337065f089df688cbf261d2577949203d)

commit 3f46d376f3019ed579951be474f11ac5e1744ea1
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Thu Nov 6 09:33:50 2014 +1100

    vacuum: Use non-blocking lock when traversing delete tree
    
    This avoids vacuuming getting in the way of ctdb daemon to process
    record requests.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    Reviewed-by: Martin Schwenke <martin at meltin.net>
    
    (Imported from commit dbb1958284657f26a868705e5f9612bc377fd5e0)

commit 64ccd71ba19a2cb7d0bc5a7259d80d0520ab69d0
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Apr 14 13:18:41 2014 +1000

    vacuum: Use non-blocking lock when traversing delete queue
    
    This avoids vacuuming getting in the way of ctdb daemon to process
    record requests.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    Reviewed-by: Martin Schwenke <martin at meltin.net>
    
    (Imported from commit d35f512cd972ac1f732fe998b2179242d042082d)

commit bd03bb7370edea2d4d74ce3f91eb109acf776d8f
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Fri Feb 21 14:58:00 2014 +1100

    vacuum: Stagger vacuuming child processes
    
    This prevents multiple child processes being forked at the same time
    for vacuuming TDBs.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    Reviewed-by: Martin Schwenke <martin at meltin.net>
    
    (Imported from commit e4597f8771f42cf315bd163c18b2f27147d3de5f)

commit 1eb332804e66b0a9d57045e1e6f15a22eb89425e
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Tue Feb 11 14:23:28 2014 +1100

    vacuum: Track time for vacuuming in database statistics
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    Reviewed-by: Martin Schwenke <martin at meltin.net>
    
    (Imported from commit a0628e317df76c7c38a7cca9c3090077fa352899)

commit 08763a59fc56eba28dcb652f1fc5ba97bef42647
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Nov 17 14:15:14 2014 +1100

    scripts: Fix stack dumping when debugging hung scripts
    
    There are parentheses missing that stop the default pattern from
    matching commands with trailing garbage (e.g. "exportfs.orig").
    
    A careful check of POSIX (and running GNU sed with --posix) suggests
    that "\|" isn't a supported way of specifying alternation in a regular
    expression.  Therefore, it is clearer to switch to extended regular
    expressions so that this has a chance of being portable (even though
    the point is to print /proc/<pid>/stack, which only works on Linux).
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Tue Nov 18 06:37:45 CET 2014 on sn-devel-104
    
    (Imported from commit 7f377cf26ecec10cd77f28c1993c48337279892d)

commit d59ebfb00a44b23400a3ecc602ab4542af06018f
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Nov 14 16:42:01 2014 +1100

    scripts: Try to restart statd after every 10 failures
    
    Also add and update tests for statd stack dumps.  Update the existing
    60.ganesha statd test to do more iterations.  Duplicate the result as
    a new test for 60.nfs.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    (Imported from commit 4cd5be87daf531cb8a67f31b91cceeaf2c488127)

commit 9c125995fec927b49ae228d2e94ffb69f32f2b69
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Nov 14 16:39:07 2014 +1100

    scripts: Add rpc.statd stack dumping to Ganesha restart
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    (Imported from commit f51672f5149110025088ef6d1fc59fe7208d2aae)

commit a5d07817c00efcbd434f2e10696a8fdbbab641c9
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Nov 14 13:59:16 2014 +1100

    scripts: Dump stack traces for hung mountd, rquotad, statd processes
    
    Add a corresponding new unit test for statd.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    (Imported from commit 968401ccdc217d0addb6235739b84dbb9d23e651)

commit d532f63178c17bc5faea3e688b9d2e026a617b9d
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Nov 14 13:48:16 2014 +1100

    scripts: Add optional program name argument to nfs_dump_some_threads()
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    (Imported from commit 1f49e1ab5b317812c0ad482404fb224368726846)

commit 6796f0d6c95755c3270ef3deea6da10c8d8473f7
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Nov 14 13:31:03 2014 +1100

    scripts: Factor out new function program_stack_traces()
    
    In the process, fix a bug where an extra trace would be printed.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    (Imported from commit 2ebc305be64cd59ad8cb4ccb6beb6ec6e66bf07a)

commit 08832d8b2398f4f3af73e781c805feeaffdc0469
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Thu Nov 13 11:02:26 2014 +1100

    daemon: Improve error handling for running event scripts
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    Reviewed-by: Martin Schwenke <martin at meltin.net>
    
    Autobuild-User(master): Martin Schwenke <martins at samba.org>
    Autobuild-Date(master): Fri Nov 14 03:06:12 CET 2014 on sn-devel-104
    
    (Imported from commit d04bfc6ec6ad7a4749ebfee2284253c4a91a81aa)

commit eb37e2108d10257dfafe2bc7a719690dd2d466d5
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Tue May 13 23:13:13 2014 +1000

    build: Move internal include files in a separate directory
    
    This will allow to build clustered samba with built-in ctdb tree rather
    than needing to install CTDB first.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    
    (Imported from commit a0db87ed1edcd199af352e457e35ac018157d646)

commit 645f15c98d572b703cecabcc2af2abb05e9b6e67
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Tue May 13 22:33:03 2014 +1000

    build: Fix dependencies on ctdb_version.h
    
    This makes sure that parallel compile builds everything correctly.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    
    (Imported from commit a065e693ee5801f12f356b7baa823e6a34271dbc)

-----------------------------------------------------------------------

Summary of changes:
 Makefile.in                                        |  6 ++-
 common/ctdb_logging.c                              |  3 --
 config/debug-hung-script.sh                        |  9 ++--
 config/events.d/60.ganesha                         |  1 +
 config/functions                                   | 47 +++++++++++++--------
 config/nfs-rpc-checks.d/10.statd.check             |  1 +
 doc/ctdb.1.xml                                     |  4 +-
 doc/ctdb.7.xml                                     |  3 --
 doc/ctdbd.conf.5.xml                               |  8 ++--
 include/ctdb_protocol.h                            |  3 ++
 include/{ => internal}/cmdline.h                   |  0
 include/{ => internal}/idtree.h                    |  0
 include/{ => internal}/includes.h                  |  0
 server/ctdb_event_helper.c                         | 48 ++++++++++++++--------
 server/ctdb_logging.c                              | 18 +++-----
 server/ctdb_ltdb_server.c                          |  5 +++
 server/ctdb_recoverd.c                             |  5 +--
 server/ctdb_vacuum.c                               | 24 ++++++-----
 server/eventscript.c                               | 10 ++++-
 tests/complex/90_debug_hung_script.sh              |  2 +-
 tests/eventscripts/60.ganesha.monitor.141.sh       | 18 +++++++-
 tests/eventscripts/60.nfs.monitor.143.sh           | 15 +++++++
 ...anesha.monitor.141.sh => 60.nfs.monitor.144.sh} | 20 ++++++++-
 tests/eventscripts/scripts/local.sh                | 12 +++++-
 tests/eventscripts/stubs/pidof                     |  3 ++
 tests/simple/13_ctdb_setdebug.sh                   | 42 +++++++------------
 tools/ctdb.c                                       | 16 +++++++-
 27 files changed, 210 insertions(+), 113 deletions(-)
 rename include/{ => internal}/cmdline.h (100%)
 rename include/{ => internal}/idtree.h (100%)
 rename include/{ => internal}/includes.h (100%)
 create mode 100755 tests/eventscripts/60.nfs.monitor.143.sh
 copy tests/eventscripts/{60.ganesha.monitor.141.sh => 60.nfs.monitor.144.sh} (57%)


Changeset truncated at 500 lines:

diff --git a/Makefile.in b/Makefile.in
index 118d80a..925ea25 100755
--- a/Makefile.in
+++ b/Makefile.in
@@ -62,7 +62,8 @@ ifeq ($(CC),gcc)
 EXTRA_CFLAGS=-Wno-format-zero-length -Wno-deprecated-declarations -fPIC
 endif
 
-CFLAGS=@CPPFLAGS@ -g -I$(srcdir)/include -Iinclude -Ilib -Ilib/util -I$(srcdir) \
+CFLAGS=@CPPFLAGS@ -g -I$(srcdir)/include -I$(srcdir)/include/internal \
+       -Iinclude -Ilib -Ilib/util -I$(srcdir) \
        $(TALLOC_CFLAGS) $(TEVENT_CFLAGS) $(TDB_CFLAGS) -I at libreplacedir@ \
 	-DVARDIR=\"$(localstatedir)\" -DETCDIR=\"$(etcdir)\" \
 	-DCTDB_VARDIR=\"$(localstatedir)/lib/ctdb\" \
@@ -160,6 +161,9 @@ $(CTDB_VERSION_H):
 	@echo Generating $@
 	$(WRAPPER) ./packaging/mkversion.sh
 
+server/ctdb_daemon.o: $(CTDB_VERSION_H)
+tools/ctdb.o: $(CTDB_VERSION_H)
+
 bin/ctdbd: $(CTDB_SERVER_OBJ)
 	@echo Linking $@
 	$(WRAPPER) $(CC) $(CFLAGS) -o $@ $(CTDB_SERVER_OBJ) $(LIB_FLAGS)
diff --git a/common/ctdb_logging.c b/common/ctdb_logging.c
index 6dd1a38..bb80fcd 100644
--- a/common/ctdb_logging.c
+++ b/common/ctdb_logging.c
@@ -176,9 +176,6 @@ int32_t ctdb_control_clear_log(struct ctdb_context *ctdb)
 }
 
 struct debug_levels debug_levels[] = {
-	{DEBUG_EMERG,	"EMERG"},
-	{DEBUG_ALERT,	"ALERT"},
-	{DEBUG_CRIT,	"CRIT"},
 	{DEBUG_ERR,	"ERR"},
 	{DEBUG_WARNING,	"WARNING"},
 	{DEBUG_NOTICE,	"NOTICE"},
diff --git a/config/debug-hung-script.sh b/config/debug-hung-script.sh
index 34e957c..3f800fc 100755
--- a/config/debug-hung-script.sh
+++ b/config/debug-hung-script.sh
@@ -1,5 +1,8 @@
 #!/bin/sh
 
+# This script only works on Linux.  Please modify (and submit patches)
+# for other operating systems.
+
 [ -n "$CTDB_BASE" ] || \
     export CTDB_BASE=$(cd -P $(dirname "$0") ; echo "$PWD")
 
@@ -28,12 +31,12 @@ fi
     # Check for processes matching a regular expression and print
     # stack staces.  This could help confirm that certain processes
     # are stuck in certain places such as the cluster filesystem.  The
-    # regexp should separate items with "\|" and should not contain
+    # regexp must separate items with "|" and must not contain
     # parentheses.  The default pattern can be replaced for testing.
-    default_pat='exportfs\|rpcinfo'
+    default_pat='exportfs|rpcinfo'
     pat="${CTDB_DEBUG_HUNG_SCRIPT_STACKPAT:-${default_pat}}"
     echo "$out" |
-    sed -n "s at .*-\(.*${pat}.*\),\([0-9]*\).*@\2 \1 at p" |
+    sed -r -n "s at .*-(.*(${pat}).*),([0-9]*).*@\3 \1 at p" |
     while read pid name ; do
 	trace=$(cat "/proc/${pid}/stack" 2>/dev/null)
 	if [ $? -eq 0 ] ; then
diff --git a/config/events.d/60.ganesha b/config/events.d/60.ganesha
index 5640b74..be77e1d 100755
--- a/config/events.d/60.ganesha
+++ b/config/events.d/60.ganesha
@@ -230,6 +230,7 @@ case "$1" in
 	p="rpc.statd"
 	which $p >/dev/null 2>/dev/null && \
 	    nfs_check_rpc_service "statd" \
+		%  10 "verbose restart:b unhealthy" \
 		-ge 6 "verbose unhealthy" \
 		-eq 4 "verbose restart" \
 		-eq 2 "restart:b"
diff --git a/config/functions b/config/functions
index 9617047..021f2ad 100755
--- a/config/functions
+++ b/config/functions
@@ -201,6 +201,27 @@ get_proc ()
 }
 
 ######################################################
+# Print up to $_max kernel stack traces for processes named $_program
+program_stack_traces ()
+{
+    _prog="$1"
+    _max="${2:-1}"
+
+    _count=1
+    for _pid in $(pidof "$_prog") ; do
+	[ $_count -le $_max ] || break
+
+	# Do this first to avoid racing with process exit
+	_stack=$(get_proc "${_pid}/stack" 2>/dev/null)
+	if [ -n "$_stack" ] ; then
+	    echo "Stack trace for ${_prog}[${_pid}]:"
+	    echo "$_stack"
+	    _count=$(($_count + 1))
+	fi
+    done
+}
+
+######################################################
 # Check that an RPC service is healthy -
 # this includes allowing a certain number of failures
 # before marking the NFS service unhealthy.
@@ -371,11 +392,13 @@ _nfs_restart_rpc_service ()
 	mountd)
 	    echo "Trying to restart $_prog_name [${_p}]"
 	    killall -q -9 "$_p"
+	    nfs_dump_some_threads "$_p"
 	    $_maybe_background $_p ${MOUNTD_PORT:+-p} $MOUNTD_PORT
 	    ;;
 	rquotad)
 	    echo "Trying to restart $_prog_name [${_p}]"
 	    killall -q -9 "$_p"
+	    nfs_dump_some_threads "$_p"
 	    $_maybe_background $_p ${RQUOTAD_PORT:+-p} $RQUOTAD_PORT
 	    ;;
 	lockd)
@@ -385,6 +408,7 @@ _nfs_restart_rpc_service ()
 	statd)
 	    echo "Trying to restart $_prog_name [${_p}]"
 	    killall -q -9 "$_p"
+	    nfs_dump_some_threads "$_p"
 	    $_maybe_background $_p \
 		${STATD_HOSTNAME:+-n} $STATD_HOSTNAME \
 		${STATD_PORT:+-p} $STATD_PORT \
@@ -668,7 +692,9 @@ startstop_ganesha()
 	    service "$_service_name" stop
 	    ;;
 	restart)
-	    service "$_service_name" restart
+	    service "$_service_name" stop
+	    nfs_dump_some_threads "rpc.statd"
+	    service "$_service_name" start
 	    ;;
     esac
 }
@@ -735,23 +761,12 @@ startstop_nfs() {
 # Dump up to the configured number of nfsd thread backtraces.
 nfs_dump_some_threads ()
 {
-    [ -n "$CTDB_NFS_DUMP_STUCK_THREADS" ] || CTDB_NFS_DUMP_STUCK_THREADS=5
+    _prog="${1:-nfsd}"
 
-    # Optimisation to avoid running an unnecessary pidof
-    [ $CTDB_NFS_DUMP_STUCK_THREADS -gt 0 ] || return 0
+    _num="${CTDB_NFS_DUMP_STUCK_THREADS:-5}"
+    [ $_num -gt 0 ] || return 0
 
-    _count=0
-    for _pid in $(pidof nfsd) ; do
-	[ $_count -le $CTDB_NFS_DUMP_STUCK_THREADS ] || break
-
-	# Do this first to avoid racing with thread exit
-	_stack=$(get_proc "${_pid}/stack" 2>/dev/null)
-	if [ -n "$_stack" ] ; then
-	    echo "Stack trace for stuck nfsd thread [${_pid}]:"
-	    echo "$_stack"
-	    _count=$(($_count + 1))
-	fi
-    done
+    program_stack_traces "$_prog" $_num
 }
 
 ########################################################
diff --git a/config/nfs-rpc-checks.d/10.statd.check b/config/nfs-rpc-checks.d/10.statd.check
index d738a32..526e238 100644
--- a/config/nfs-rpc-checks.d/10.statd.check
+++ b/config/nfs-rpc-checks.d/10.statd.check
@@ -1,3 +1,4 @@
+%  10 verbose restart:b unhealthy
 -ge 6 verbose unhealthy
 -eq 4 verbose restart
 -eq 2 restart:b
diff --git a/doc/ctdb.1.xml b/doc/ctdb.1.xml
index 054948d..a62d425 100644
--- a/doc/ctdb.1.xml
+++ b/doc/ctdb.1.xml
@@ -902,7 +902,7 @@ DB Statistics: locking.tdb
 	The list of debug levels from highest to lowest are :
       </para>
       <para>
-	EMERG ALERT CRIT ERR WARNING NOTICE INFO DEBUG
+	ERR WARNING NOTICE INFO DEBUG
       </para>
     </refsect2>
 
@@ -912,7 +912,7 @@ DB Statistics: locking.tdb
 	Set the debug level of a node. This controls what information will be logged.
       </para>
       <para>
-	The debuglevel is one of EMERG ALERT CRIT ERR WARNING NOTICE INFO DEBUG
+	The debuglevel is one of ERR WARNING NOTICE INFO DEBUG
       </para>
     </refsect2>
 
diff --git a/doc/ctdb.7.xml b/doc/ctdb.7.xml
index a94b62f..b54fa42 100644
--- a/doc/ctdb.7.xml
+++ b/doc/ctdb.7.xml
@@ -883,9 +883,6 @@ CTDB_NATGW_DEFAULT_GATEWAY=10.0.0.1
     </para>
 
     <simplelist>
-      <member>EMERG (-3)</member>
-      <member>ALERT (-2)</member>
-      <member>CRIT (-1)</member>
       <member>ERR (0)</member>
       <member>WARNING (1)</member>
       <member>NOTICE (2)</member>
diff --git a/doc/ctdbd.conf.5.xml b/doc/ctdbd.conf.5.xml
index 149aa62..803c232 100644
--- a/doc/ctdbd.conf.5.xml
+++ b/doc/ctdbd.conf.5.xml
@@ -1469,11 +1469,13 @@ CTDB_SET_MonitorInterval=20
 	  <para>
 	    REGEXP specifies interesting processes for which stack
 	    traces should be logged when debugging hung eventscripts
-	    and those processes are matched in pstree output.  See
-	    also <citetitle>CTDB_DEBUG_HUNG_SCRIPT</citetitle>.
+	    and those processes are matched in pstree output.  REGEXP
+	    is an extended regexp so choices are separated by pipes
+	    ('|').  However, REGEXP should not contain parentheses.
+	    See also <citetitle>CTDB_DEBUG_HUNG_SCRIPT</citetitle>.
 	  </para>
 	  <para>
-	    Default is "exportfs\|rpcinfo".
+	    Default is "exportfs|rpcinfo".
 	  </para>
 	</listitem>
       </varlistentry>
diff --git a/include/ctdb_protocol.h b/include/ctdb_protocol.h
index 629c91c..1068132 100644
--- a/include/ctdb_protocol.h
+++ b/include/ctdb_protocol.h
@@ -717,6 +717,9 @@ struct ctdb_db_statistics {
 		struct latency_counter latency;
 		uint32_t buckets[MAX_COUNT_BUCKETS];
 	} locks;
+	struct {
+		struct latency_counter latency;
+	} vacuum;
 	uint32_t db_ro_delegations;
 	uint32_t db_ro_revokes;
 	uint32_t hop_count_bucket[MAX_COUNT_BUCKETS];
diff --git a/include/cmdline.h b/include/internal/cmdline.h
similarity index 100%
rename from include/cmdline.h
rename to include/internal/cmdline.h
diff --git a/include/idtree.h b/include/internal/idtree.h
similarity index 100%
rename from include/idtree.h
rename to include/internal/idtree.h
diff --git a/include/includes.h b/include/internal/includes.h
similarity index 100%
rename from include/includes.h
rename to include/internal/includes.h
diff --git a/server/ctdb_event_helper.c b/server/ctdb_event_helper.c
index 9ff763c..f14e336 100644
--- a/server/ctdb_event_helper.c
+++ b/server/ctdb_event_helper.c
@@ -67,7 +67,7 @@ int main(int argc, char *argv[])
 {
 	int log_fd, write_fd;
 	pid_t pid;
-	int status, output;
+	int status, output, ret;
 
 	progname = argv[0];
 
@@ -99,33 +99,47 @@ int main(int argc, char *argv[])
 
 	pid = fork();
 	if (pid < 0) {
+		int save_errno = errno;
 		fprintf(stderr, "Failed to fork - %s\n", strerror(errno));
-		exit(errno);
+		sys_write(write_fd, &save_errno, sizeof(save_errno));
+		exit(1);
 	}
 
 	if (pid == 0) {
-		int save_errno;
-
-		execv(argv[3], &argv[3]);
-		if (errno == EACCES) {
-			save_errno = check_executable(argv[3]);
-		} else {
-			save_errno = errno;
+		ret = check_executable(argv[3]);
+		if (ret != 0) {
+			_exit(ret);
+		}
+		ret = execv(argv[3], &argv[3]);
+		if (ret != 0) {
+			int save_errno = errno;
 			fprintf(stderr, "Error executing '%s' - %s\n",
-				argv[3], strerror(errno));
+				argv[3], strerror(save_errno));
 		}
-		_exit(save_errno);
+		/* This should never happen */
+		_exit(ENOEXEC);
 	}
 
-	waitpid(pid, &status, 0);
+	ret = waitpid(pid, &status, 0);
+	if (ret == -1) {
+		output = -errno;
+		fprintf(stderr, "waitpid() failed - %s\n", strerror(errno));
+		sys_write(write_fd, &output, sizeof(output));
+		exit(1);
+	}
 	if (WIFEXITED(status)) {
-		output = WEXITSTATUS(status);
-		if (output == ENOENT || output == ENOEXEC) {
-			output = -output;
-		}
+		output = -WEXITSTATUS(status);
+		sys_write(write_fd, &output, sizeof(output));
+		exit(0);
+	}
+	if (WIFSIGNALED(status)) {
+		output = -EINTR;
+		fprintf(stderr, "Process terminated with signal - %d\n",
+			WTERMSIG(status));
 		sys_write(write_fd, &output, sizeof(output));
-		exit(output);
+		exit(0);
 	}
 
+	fprintf(stderr, "waitpid() status=%d\n", status);
 	exit(1);
 }
diff --git a/server/ctdb_logging.c b/server/ctdb_logging.c
index 9f6f3b5..eb743ca 100644
--- a/server/ctdb_logging.c
+++ b/server/ctdb_logging.c
@@ -223,15 +223,6 @@ static void ctdb_syslog_log(const char *format, va_list ap)
 	}
 
 	switch (this_log_level) {
-	case DEBUG_EMERG: 
-		level = LOG_EMERG; 
-		break;
-	case DEBUG_ALERT: 
-		level = LOG_ALERT; 
-		break;
-	case DEBUG_CRIT: 
-		level = LOG_CRIT; 
-		break;
 	case DEBUG_ERR: 
 		level = LOG_ERR; 
 		break;
@@ -413,8 +404,9 @@ static void write_to_log(struct ctdb_log_state *log,
 /*
   called when log data comes in from a child process
  */
-static void ctdb_log_handler(struct event_context *ev, struct fd_event *fde, 
-			     uint16_t flags, void *private)
+static void ctdb_child_log_handler(struct event_context *ev,
+				   struct fd_event *fde,
+				   uint16_t flags, void *private)
 {
 	struct ctdb_log_state *log = talloc_get_type(private, struct ctdb_log_state);
 	char *p;
@@ -535,7 +527,7 @@ struct ctdb_log_state *ctdb_vfork_with_logging(TALLOC_CTX *mem_ctx,
 	set_close_on_exec(log->pfd);
 	talloc_set_destructor(log, log_context_destructor);
 	fde = tevent_add_fd(ctdb->ev, log, log->pfd, EVENT_FD_READ,
-			    ctdb_log_handler, log);
+			    ctdb_child_log_handler, log);
 	tevent_fd_set_auto_close(fde);
 
 	return log;
@@ -592,7 +584,7 @@ int ctdb_set_child_logging(struct ctdb_context *ctdb)
 	close(old_stderr);
 
 	fde = event_add_fd(ctdb->ev, ctdb->log, p[0],
-			   EVENT_FD_READ, ctdb_log_handler, ctdb->log);
+			   EVENT_FD_READ, ctdb_child_log_handler, ctdb->log);
 	tevent_fd_set_auto_close(fde);
 
 	ctdb->log->pfd = p[0];
diff --git a/server/ctdb_ltdb_server.c b/server/ctdb_ltdb_server.c
index fb4bb0a..24ad255 100644
--- a/server/ctdb_ltdb_server.c
+++ b/server/ctdb_ltdb_server.c
@@ -115,6 +115,11 @@ static int ctdb_ltdb_store_server(struct ctdb_db_context *ctdb_db,
 		 * fails. So storing the empty record makes sure that we do not
 		 * need to change the client code.
 		 */
+		if ((header->flags & CTDB_REC_FLAG_VACUUM_MIGRATED) &&
+		    (ctdb_db->ctdb->pnn == header->dmaster)) {
+			keep = true;
+			schedule_for_deletion = true;
+		}
 		if (!(header->flags & CTDB_REC_FLAG_VACUUM_MIGRATED)) {
 			keep = true;
 		} else if (ctdb_db->ctdb->pnn != header->dmaster) {
diff --git a/server/ctdb_recoverd.c b/server/ctdb_recoverd.c
index d3c06b4..39e833c 100644
--- a/server/ctdb_recoverd.c
+++ b/server/ctdb_recoverd.c
@@ -910,9 +910,7 @@ static void vacuum_fetch_next(struct vacuum_info *v);
  */
 static void vacuum_fetch_callback(struct ctdb_client_call_state *state)
 {
-	struct vacuum_info *v = talloc_get_type(state->async.private_data, struct vacuum_info);
 	talloc_free(state);
-	vacuum_fetch_next(v);
 }
 
 
@@ -977,8 +975,7 @@ static void vacuum_fetch_next(struct vacuum_info *v)
 			return;
 		}
 		state->async.fn = vacuum_fetch_callback;
-		state->async.private_data = v;
-		return;
+		state->async.private_data = NULL;
 	}
 
 	talloc_free(v);
diff --git a/server/ctdb_vacuum.c b/server/ctdb_vacuum.c
index 5013339..85ce91d 100644
--- a/server/ctdb_vacuum.c
+++ b/server/ctdb_vacuum.c
@@ -317,12 +317,8 @@ static int delete_marshall_traverse_first(void *param, void *data)
 	uint32_t hash = ctdb_hash(&(dd->key));
 	int res;
 
-	res = tdb_chainlock(ctdb_db->ltdb->tdb, dd->key);
+	res = tdb_chainlock_nonblock(ctdb_db->ltdb->tdb, dd->key);
 	if (res != 0) {
-		DEBUG(DEBUG_ERR,
-		      (__location__ " Error getting chainlock on record with "
-		       "key hash [0x%08x] on database db[%s].\n",
-		       hash, ctdb_db->db_name));
 		recs->vdata->count.delete_list.skipped++;
 		recs->vdata->count.delete_list.left--;
 		talloc_free(dd);
@@ -446,12 +442,8 @@ static int delete_queue_traverse(void *param, void *data)
 
 	vdata->count.delete_queue.total++;
 
-	res = tdb_chainlock(ctdb_db->ltdb->tdb, dd->key);
+	res = tdb_chainlock_nonblock(ctdb_db->ltdb->tdb, dd->key);
 	if (res != 0) {
-		DEBUG(DEBUG_ERR,
-		      (__location__ " Error getting chainlock on record with "
-		       "key hash [0x%08x] on database db[%s].\n",
-		       hash, ctdb_db->db_name));
 		vdata->count.delete_queue.error++;
 		return 0;
 	}
@@ -1364,6 +1356,7 @@ static int vacuum_child_destructor(struct ctdb_vacuum_child_context *child_ctx)
 	struct ctdb_db_context *ctdb_db = child_ctx->vacuum_handle->ctdb_db;
 	struct ctdb_context *ctdb = ctdb_db->ctdb;
 
+	CTDB_UPDATE_DB_LATENCY(ctdb_db, "vacuum", vacuum.latency, l);
 	DEBUG(DEBUG_INFO,("Vacuuming took %.3f seconds for database %s\n", l, ctdb_db->db_name));
 
 	if (child_ctx->child_pid != -1) {
@@ -1450,6 +1443,17 @@ ctdb_vacuum_event(struct event_context *ev, struct timed_event *te,
 		return;
 	}
 
+	/* Do not allow multiple vacuuming child processes to be active at the
+	 * same time.  If there is vacuuming child process active, delay
+	 * new vacuuming event to stagger vacuuming events.
+	 */
+	if (ctdb->vacuumers != NULL) {
+		event_add_timed(ctdb->ev, vacuum_handle,
+				timeval_current_ofs(0, 500*1000),
+				ctdb_vacuum_event, vacuum_handle);
+		return;
+	}
+
 	child_ctx = talloc(vacuum_handle, struct ctdb_vacuum_child_context);
 	if (child_ctx == NULL) {
 		DEBUG(DEBUG_CRIT, (__location__ " Failed to allocate child context for vacuuming of %s\n", ctdb_db->db_name));
diff --git a/server/eventscript.c b/server/eventscript.c
index ff05617..84dcf68 100644
--- a/server/eventscript.c
+++ b/server/eventscript.c
@@ -367,6 +367,8 @@ static void ctdb_event_script_handler(struct event_context *ev, struct fd_event
 	r = sys_read(state->fd[0], &current->status, sizeof(current->status));
 	if (r < 0) {
 		current->status = -errno;
+	} else if (r == 0) {
+		current->status = -EINTR;
 	} else if (r != sizeof(current->status)) {


-- 
CTDB repository