[SCM] CTDB repository - branch 1.0.112 updated - ctdb-1.0.111-118-gc19b5fc

Sun Jul 25 19:16:16 MDT 2010

The branch, 1.0.112 has been updated
       via  c19b5fcd1689149e750ccc0a7ac1934045b46e1c (commit)
       via  e3e2f06fb2f044ce1c8caf2dcec133e42d8285f5 (commit)
       via  66c1f85b6df24f5d0ae79ca29b72d4078634c6fd (commit)
       via  09d5dc94930a1349bb74b5557a4e71144ad525a4 (commit)
       via  517f05e42f17766b1e8db8f1f4789cbad968e304 (commit)
       via  2e771b8073ab845872dafe484475a6eb7050a295 (commit)
       via  b5ff69c2f7437ddb66697688c935d6f7332b858d (commit)
       via  d54bf6bf7ccbf601a059632f0696fce54c13add3 (commit)
       via  7912d1eb55dc5870b1adbe78de3308c1e3f15cb9 (commit)
       via  cacdfb43e1c034ec20d6ea63ec44d0e2336754e5 (commit)
       via  3ef5c07bb358d31ccce611c289060398dfa1fe14 (commit)
       via  9783c0127e8148e1dd92ee98187facf135a6e369 (commit)
       via  d658264c2b507a09e54fc296488f892e6be78d84 (commit)
       via  3cd449cf06faba7ba38939f4e5727723b7a0141a (commit)
       via  9b4884e0bad3b23a8cf32ff19dc9bb8b26436e2d (commit)
       via  7d4658d3fc09560ccf16b304ffdb5391a2b48f72 (commit)
      from  18bf912059cd01a8854ad52c664eb0b3fc957dee (commit)

http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=shortlog;h=1.0.112


- Log -----------------------------------------------------------------
commit c19b5fcd1689149e750ccc0a7ac1934045b46e1c
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 21 17:21:09 2010 +0930

    New version 1.0.112-28

commit e3e2f06fb2f044ce1c8caf2dcec133e42d8285f5
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 21 17:19:34 2010 +0930

    Revert "Make deterministic ips off by the default."
    
    This reverts commit 09d5dc94930a1349bb74b5557a4e71144ad525a4.
    
    We decided more review is needed, and we should not change this
    for 1.0.112.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 66c1f85b6df24f5d0ae79ca29b72d4078634c6fd
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 21 12:33:17 2010 +0930

    New version 1.0.112-27

commit 09d5dc94930a1349bb74b5557a4e71144ad525a4
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 21 12:39:04 2010 +0930

    Make deterministic ips off by the default.
    
    The git log makes it clear that it's mainly useful for debugging; we
    should turn it off in production to minimize IP address movement.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 517f05e42f17766b1e8db8f1f4789cbad968e304
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 21 12:29:55 2010 +0930

    freeze: abort vacuuming when we're going to freeze.
    
    There are some reports of freeze timeouts, and it looks like vacuuming might
    be the culprit.  So we add code to tell them to abort when a freeze is
    going on.
    
    CQ:S1018154 & S1018349
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 2e771b8073ab845872dafe484475a6eb7050a295
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 21 12:28:04 2010 +0930

    vacuum: disabling vacuuming during a freeze
    
    We shouldn't even think about vacuuming when we've frozen the database
    (which is earlier than when we set CTDB_RECOVERY_ACTIVE)
    
    CQ:S1018154 & S1018349
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit b5ff69c2f7437ddb66697688c935d6f7332b858d
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Jul 19 19:29:09 2010 +0930

    logging: give a unique logging name to each forked child.
    
    This means we can distinguish which child is logging, esp. via syslog where we have no pid.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit d54bf6bf7ccbf601a059632f0696fce54c13add3
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Thu Jul 15 17:43:46 2010 +0930

    New version 1.0.112-26

commit 7912d1eb55dc5870b1adbe78de3308c1e3f15cb9
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Jul 12 15:11:42 2010 +0930

    config: wrap iptables in flock to avoid concurrancy.
    
    When doing a releaseip event, we do them in parallel for all the separate
    IPs.  This creates a problem for iptables, which isn't reentrant, giving
    the strange message:
    	iptables encountered unknown error "18446744073709551615" while initializing table "filter"
    
    The worst possible symptom of this is that releaseip won't remove the rule
    which prevents us listening to clients during releaseip, and the node will be
    healthy but non-responsive.
    
    The simple workaround is to flock-wrap iptables.  Better would be to rework
    the code so we didn't need to use iptables in these paths.
    
    CQ:S1018353
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit cacdfb43e1c034ec20d6ea63ec44d0e2336754e5
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Tue Jul 6 17:33:00 2010 +0930

    New version 1.0.112-25
    * Tue Jul 6 2010 : Version 1.0.112-25
     - natgw firewall fix
       BZ62613

commit 3ef5c07bb358d31ccce611c289060398dfa1fe14
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jul 6 17:54:43 2010 +1000

    Move NAT gateway firewall rules to recovered|updatenatgw events.
    
    The existing code wasn't working as designed in the start event.  It
    should work here.
    
    BZ: 62613
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit 9783c0127e8148e1dd92ee98187facf135a6e369
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Jul 5 12:14:06 2010 +0930

    New version 1.0.112-24
    
    * Mon Jul 5 2010 : Version 1.0.112-24
     - Extra logging on tdb_chainunlock failures.
     - Extra sanity check in ctdb_become_dmaster
       BZ65158
     - More robustness against IDR wrap
       BZ65158
     - Recovery failure under stress fix
       BZ:65158
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit d658264c2b507a09e54fc296488f892e6be78d84
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Thu Jul 1 21:46:55 2010 +1000

    ctdb_freeze: extend db priority hack to cover serverid.tdb deadlock.
    
    We discovered that recent smbd locks the serverid tdb while
    holding a lock on another tdb (locking.tdb):
      7: POSIX  ADVISORY  WRITE smbd-2224318 locking.tdb.0 10600 10600
      22: -> POSIX  ADVISORY  READ  smbd-2224318 serverid.tdb.0 26580 26580
    
    The result is a deadlock against the ctdb_freeze code called for
    recovery.  We extend the "notify" workaround to this case, too.
    
    BZ:65158
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 3cd449cf06faba7ba38939f4e5727723b7a0141a
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date:   Thu Jun 10 14:30:38 2010 +1000

    Wrap the IDR early, but not too early.
    
    We dont want it to wrap almost immediately so that basically all "ctdb ..."
    commands log the "Reqid wrap" warning.

commit 9b4884e0bad3b23a8cf32ff19dc9bb8b26436e2d
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Thu Jun 10 08:58:55 2010 +0930

    Delay reusing ids to make protocol more robust
    
    Ronnie and I tracked down a bug which seems to be caused by a node
    running so slowly that we timed out the request and reused the request
    id before it responded.
    
    The result was that we unlocked the wrong record, leading to the
    following:
    
    	ctdbd: tdb_unlock: count is 0
    	ctdbd: tdb_chainunlock failed
    	smbd[1630912]: [2010/06/08 15:32:28.251716,  0] lib/util_sock.c:1491(get_peer_addr_internal)
    	ctdbd: Could not find idr:43
    	ctdbd: server/ctdb_call.c:492 reqid 43 not found
    
    This exact problem is now detected, but in general we want to delay
    id reuse as long as possible to make our system more robust.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 7d4658d3fc09560ccf16b304ffdb5391a2b48f72
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Thu Jun 10 08:55:56 2010 +0930

    idtree: fix handling of large ids (eg INT_MAX)
    
    Since idtree assigns sequentially, it rarely reaches high numbers.
    But such numbers can be forced with idr_get_new_above(), and that
    reveals two bugs:
    1) Crash in sub_remove() caused by pa array being too short.
    2) Shift by more than 32 in _idr_find(), which is undefined, causing
       the "outside the current tree" optimization to misfire and return NULL.
    
    Signed-off-by: Rusty Russell <rusty at rustorp.com.au>

-----------------------------------------------------------------------

Summary of changes:
 client/ctdb_client.c       |   10 +++++-
 common/ctdb_logging.c      |    2 +-
 common/ctdb_util.c         |    8 ++++-
 config/events.d/11.natgw   |   12 +++---
 config/functions           |    6 +++
 include/ctdb.h             |    2 +-
 include/ctdb_private.h     |    5 ++-
 lib/util/debug.c           |    4 ++-
 lib/util/debug.h           |    1 +
 lib/util/idtree.c          |    6 ++-
 packaging/RPM/ctdb.spec.in |   30 ++++++++++++++++-
 server/ctdb_freeze.c       |   12 +++++-
 server/ctdb_lockwait.c     |    1 +
 server/ctdb_logging.c      |   13 ++++---
 server/ctdb_monitor.c      |    1 +
 server/ctdb_persistent.c   |    1 +
 server/ctdb_recover.c      |    1 +
 server/ctdb_recoverd.c     |    3 +-
 server/ctdb_traverse.c     |    2 +
 server/ctdb_vacuum.c       |   80 ++++++++++++++++++++++++++++++++++++++++----
 server/eventscript.c       |    3 ++
 21 files changed, 173 insertions(+), 30 deletions(-)
 mode change 100644 => 100755 config/events.d/11.natgw


Changeset truncated at 500 lines:

diff --git a/client/ctdb_client.c b/client/ctdb_client.c
index 4a307c4..7caa5cb 100644
--- a/client/ctdb_client.c
+++ b/client/ctdb_client.c
@@ -2748,6 +2748,8 @@ struct ctdb_context *ctdb_init(struct event_context *ev)
 	}
 	ctdb->ev  = ev;
 	ctdb->idr = idr_init(ctdb);
+	/* Wrap early to exercise code. */
+	ctdb->lastid = INT_MAX-200;
 	CTDB_NO_MEMORY_NULL(ctdb, ctdb->idr);
 
 	ret = ctdb_set_socketname(ctdb, CTDB_PATH);
@@ -3652,9 +3654,15 @@ int ctdb_ctrl_recd_ping(struct ctdb_context *ctdb)
  * to the daemon as a client process, this function can be used to change
  * the ctdb context from daemon into client mode
  */
-int switch_from_server_to_client(struct ctdb_context *ctdb)
+int switch_from_server_to_client(struct ctdb_context *ctdb, const char *fmt, ...)
 {
 	int ret;
+	va_list ap;
+
+	/* Add extra information so we can identify this in the logs */
+	va_start(ap, fmt);
+	debug_extra = talloc_append_string(NULL, talloc_vasprintf(NULL, fmt, ap), ":");
+	va_end(ap);
 
 	/* shutdown the transport */
 	if (ctdb->methods) {
diff --git a/common/ctdb_logging.c b/common/ctdb_logging.c
index 64507f4..1b011b6 100644
--- a/common/ctdb_logging.c
+++ b/common/ctdb_logging.c
@@ -151,7 +151,7 @@ int32_t ctdb_control_get_log(struct ctdb_context *ctdb, TDB_DATA addr)
 	}
 
 	if (child == 0) {
-		if (switch_from_server_to_client(ctdb) != 0) {
+		if (switch_from_server_to_client(ctdb, "log-collector") != 0) {
 			DEBUG(DEBUG_CRIT, (__location__ "ERROR: failed to switch log collector child into client mode.\n"));
 			_exit(1);
 		}
diff --git a/common/ctdb_util.c b/common/ctdb_util.c
index 87f61e7..9dc6d7a 100644
--- a/common/ctdb_util.c
+++ b/common/ctdb_util.c
@@ -159,7 +159,13 @@ void ctdb_reclock_latency(struct ctdb_context *ctdb, const char *name, double *l
 
 uint32_t ctdb_reqid_new(struct ctdb_context *ctdb, void *state)
 {
-	return idr_get_new(ctdb->idr, state, INT_MAX);
+	int id = idr_get_new_above(ctdb->idr, state, ctdb->lastid+1, INT_MAX);
+	if (id < 0) {
+		DEBUG(DEBUG_NOTICE, ("Reqid wrap!\n"));
+		id = idr_get_new(ctdb->idr, state, INT_MAX);
+	}
+	ctdb->lastid = id;
+	return id;
 }
 
 void *_ctdb_reqid_find(struct ctdb_context *ctdb, uint32_t reqid, const char *type, const char *location)
diff --git a/config/events.d/11.natgw b/config/events.d/11.natgw
old mode 100644
new mode 100755
index 7ae9c98..9147dca
--- a/config/events.d/11.natgw
+++ b/config/events.d/11.natgw
@@ -37,12 +37,6 @@ case "$1" in
 		exit 1
 	}
 
-	# block all incoming connections to the natgw address
-	CTDB_NATGW_PUBLIC_IP_HOST=`echo $CTDB_NATGW_PUBLIC_IP | sed -e "s/\/.*/\/32/"`
-	iptables -D INPUT -p tcp --syn -d $CTDB_NATGW_PUBLIC_IP_HOST -j REJECT 2>/dev/null
-	iptables -I INPUT -p tcp --syn -d $CTDB_NATGW_PUBLIC_IP_HOST -j REJECT 2>/dev/null
-
-
 	# do not respond to ARPs that are for ip addresses with scope 'host'
 	echo 3 > /proc/sys/net/ipv4/conf/all/arp_ignore
 	# do not send out arp requests from loopback addresses
@@ -68,6 +62,12 @@ case "$1" in
 		# This is the first node, set it up as the NAT GW
 		echo 1 >/proc/sys/net/ipv4/ip_forward
 		iptables -A POSTROUTING -t nat -s $CTDB_NATGW_PRIVATE_NETWORK -d ! $CTDB_NATGW_PRIVATE_NETWORK -j MASQUERADE
+
+		# block all incoming connections to the natgw address
+		CTDB_NATGW_PUBLIC_IP_HOST=`echo $CTDB_NATGW_PUBLIC_IP | sed -e "s/\/.*/\/32/"`
+		iptables -D INPUT -p tcp --syn -d $CTDB_NATGW_PUBLIC_IP_HOST -j REJECT 2>/dev/null
+		iptables -I INPUT -p tcp --syn -d $CTDB_NATGW_PUBLIC_IP_HOST -j REJECT 2>/dev/null
+
 		ip addr add $CTDB_NATGW_PUBLIC_IP dev $CTDB_NATGW_PUBLIC_IFACE
 		ip route add 0.0.0.0/0 via $CTDB_NATGW_DEFAULT_GATEWAY >/dev/null 2>/dev/null
 	else
diff --git a/config/functions b/config/functions
index 5c15422..e0d0735 100755
--- a/config/functions
+++ b/config/functions
@@ -601,6 +601,12 @@ ctdb_standard_event_handler ()
     esac
 }
 
+# iptables doesn't like being re-entered, so flock-wrap it.
+iptables()
+{
+	flock -w 30 /var/ctdb/iptables-ctdb.flock /sbin/iptables "$@"
+}
+
 ########################################################
 # load a site local config file
 ########################################################
diff --git a/include/ctdb.h b/include/ctdb.h
index c380c3d..cc83495 100644
--- a/include/ctdb.h
+++ b/include/ctdb.h
@@ -663,7 +663,7 @@ int ctdb_transaction_commit(struct ctdb_transaction_handle *h);
 
 int ctdb_ctrl_recd_ping(struct ctdb_context *ctdb);
 
-int switch_from_server_to_client(struct ctdb_context *ctdb);
+int switch_from_server_to_client(struct ctdb_context *ctdb, const char *fmt, ...);
 
 #define MONITOR_SCRIPT_OK      0
 #define MONITOR_SCRIPT_TIMEOUT 1
diff --git a/include/ctdb_private.h b/include/ctdb_private.h
index 099182a..41a761a 100644
--- a/include/ctdb_private.h
+++ b/include/ctdb_private.h
@@ -424,7 +424,7 @@ struct ctdb_context {
 	unsigned flags;
 	uint32_t capabilities;
 	struct idr_context *idr;
-	uint16_t idr_cnt;
+	int lastid;
 	struct ctdb_node **nodes; /* array of nodes in the cluster - indexed by vnn */
 	struct ctdb_vnn *vnn; /* list of public ip addresses and interfaces */
 	struct ctdb_vnn *single_ip_vnn; /* a structure for the single ip */
@@ -467,6 +467,8 @@ struct ctdb_context {
 
 	TALLOC_CTX *banning_ctx;
 
+	struct ctdb_vacuum_child_context *vacuumers;
+
 	/* mapping from pid to ctdb_client * */
 	struct ctdb_client_pid_list *client_pids;
 
@@ -1523,6 +1525,7 @@ int ctdb_ctrl_report_recd_lock_latency(struct ctdb_context *ctdb, struct timeval
 int32_t ctdb_control_stop_node(struct ctdb_context *ctdb, struct ctdb_req_control *c, bool *async_reply);
 int32_t ctdb_control_continue_node(struct ctdb_context *ctdb);
 
+void ctdb_stop_vacuuming(struct ctdb_context *ctdb);
 int ctdb_vacuum_init(struct ctdb_db_context *ctdb_db);
 
 int32_t ctdb_control_enable_script(struct ctdb_context *ctdb, TDB_DATA indata);
diff --git a/lib/util/debug.c b/lib/util/debug.c
index d4d3bd6..a0c3fe9 100644
--- a/lib/util/debug.c
+++ b/lib/util/debug.c
@@ -42,13 +42,15 @@ static void _do_debug_v(const char *format, va_list ap)
 
 	strftime(tbuf,sizeof(tbuf)-1,"%Y/%m/%d %H:%M:%S", tm);
 
-	fprintf(stderr, "%s.%06u [%5u]: %s", tbuf, (unsigned)t.tv_usec, (unsigned)getpid(), s);
+	fprintf(stderr, "%s.%06u [%s%5u]: %s", tbuf, (unsigned)t.tv_usec,
+		debug_extra, (unsigned)getpid(), s);
 	fflush(stderr);
 	free(s);
 }
 
 /* default logging function */
 void (*do_debug_v)(const char *, va_list ap) = _do_debug_v;
+const char *debug_extra = "";
 
 void do_debug(const char *format, ...)
 {
diff --git a/lib/util/debug.h b/lib/util/debug.h
index d91f43b..b81a097 100644
--- a/lib/util/debug.h
+++ b/lib/util/debug.h
@@ -18,6 +18,7 @@
 */
 
 void (*do_debug_v)(const char *, va_list ap);
+const char *debug_extra;
 void (*do_debug_add_v)(const char *, va_list ap);
 void log_ringbuffer(const char *format, ...);
 void do_debug(const char *format, ...) PRINTF_ATTRIBUTE(1, 2);
diff --git a/lib/util/idtree.c b/lib/util/idtree.c
index 06544e1..ef6d21f 100644
--- a/lib/util/idtree.c
+++ b/lib/util/idtree.c
@@ -240,7 +240,7 @@ build_up:
 static int sub_remove(struct idr_context *idp, int shift, int id)
 {
 	struct idr_layer *p = idp->top;
-	struct idr_layer **pa[MAX_LEVEL];
+	struct idr_layer **pa[1+MAX_LEVEL];
 	struct idr_layer ***paa = &pa[0];
 	int n;
 
@@ -280,8 +280,10 @@ static void *_idr_find(struct idr_context *idp, int id)
 	 * This tests to see if bits outside the current tree are
 	 * present.  If so, tain't one of ours!
 	 */
-	if ((id & ~(~0 << MAX_ID_SHIFT)) >> (n + IDR_BITS))
+	if (n + IDR_BITS < 31 &&
+	    ((id & ~(~0 << MAX_ID_SHIFT)) >> (n + IDR_BITS))) {
 	     return NULL;
+	}
 
 	/* Mask off upper bits we don't use for the search. */
 	id &= MAX_ID_MASK;
diff --git a/packaging/RPM/ctdb.spec.in b/packaging/RPM/ctdb.spec.in
index bdcf6e2..def3cce 100644
--- a/packaging/RPM/ctdb.spec.in
+++ b/packaging/RPM/ctdb.spec.in
@@ -5,7 +5,7 @@ Vendor: Samba Team
 Packager: Samba Team <samba at samba.org>
 Name: ctdb
 Version: 1.0.112
-Release: 23
+Release: 28
 Epoch: 0
 License: GNU GPL version 3
 Group: System Environment/Daemons
@@ -125,6 +125,34 @@ rm -rf $RPM_BUILD_ROOT
 %{_docdir}/ctdb/tests/bin/ctdb_transaction
 
 %changelog
+* Wed Jul 21 2010 : Version 1.0.112-28
+ - Turn back on deterministic IPs by default
+   (needs more analysis)
+
+* Wed Jul 21 2010 : Version 1.0.112-27
+ - Turn off deterministic IPs by default
+   CQ:S1018175
+ - vaccuum abort during freeze
+   CQ:S1018154 & S1018349
+ - logging enhancement for ctdbd children
+
+* Thu Jul 15 2010 : Version 1.0.112-26
+ - iptables parallel invocation fix
+   CQ:S1018353
+
+* Tue Jul 6 2010 : Version 1.0.112-25
+ - natgw firewall fix
+   BZ62613
+
+* Mon Jul 5 2010 : Version 1.0.112-24
+ - Extra logging on tdb_chainunlock failures.
+ - Extra sanity check in ctdb_become_dmaster
+   BZ65158
+ - More robustness against IDR wrap
+   BZ65158
+ - Recovery failure under stress fix
+   BZ:65158
+
 * Tue Jun 8 2010 : Version 1.0.112-23
  - Fix a SEGV that can be triggered by "ctdb delip"
    BZ 62783
diff --git a/server/ctdb_freeze.c b/server/ctdb_freeze.c
index 70333b0..0dc86a7 100644
--- a/server/ctdb_freeze.c
+++ b/server/ctdb_freeze.c
@@ -26,6 +26,10 @@
 #include "lib/util/dlinklist.h"
 #include "db_wrap.h"
 
+static bool later_db(const char *name)
+{
+	return (strstr(name, "notify") || strstr(name, "serverid"));
+}
 
 /*
   lock all databases
@@ -43,7 +47,7 @@ static int ctdb_lock_all_databases(struct ctdb_context *ctdb, uint32_t priority)
 		if (ctdb_db->priority != priority) {
 			continue;
 		}
-		if (strstr(ctdb_db->db_name, "notify") != NULL) {
+		if (later_db(ctdb_db->db_name)) {
 			continue;
 		}
 		DEBUG(DEBUG_INFO,("locking database 0x%08x priority:%u %s\n", ctdb_db->db_id, ctdb_db->priority, ctdb_db->db_name));
@@ -56,7 +60,7 @@ static int ctdb_lock_all_databases(struct ctdb_context *ctdb, uint32_t priority)
 		if (ctdb_db->priority != priority) {
 			continue;
 		}
-		if (strstr(ctdb_db->db_name, "notify") == NULL) {
+		if (!later_db(ctdb_db->db_name)) {
 			continue;
 		}
 		DEBUG(DEBUG_INFO,("locking database 0x%08x priority:%u %s\n", ctdb_db->db_id, ctdb_db->priority, ctdb_db->db_name));
@@ -198,6 +202,7 @@ static struct ctdb_freeze_handle *ctdb_freeze_lock(struct ctdb_context *ctdb, ui
 		/* in the child */
 		close(fd[0]);
 
+		debug_extra = talloc_asprintf(NULL, "freeze_lock-%u:", priority);
 		ret = ctdb_lock_all_databases(ctdb, priority);
 		if (ret != 0) {
 			_exit(0);
@@ -268,6 +273,9 @@ int ctdb_start_freeze(struct ctdb_context *ctdb, uint32_t priority)
 		return 0;
 	}
 
+	/* Stop any vacuuming going on: we don't want to wait. */
+	ctdb_stop_vacuuming(ctdb);
+
 	/* if there isn't a freeze lock child then create one */
 	if (ctdb->freeze_handles[priority] == NULL) {
 		ctdb->freeze_handles[priority] = ctdb_freeze_lock(ctdb, priority);
diff --git a/server/ctdb_lockwait.c b/server/ctdb_lockwait.c
index afbb921..b7ae55c 100644
--- a/server/ctdb_lockwait.c
+++ b/server/ctdb_lockwait.c
@@ -136,6 +136,7 @@ struct lockwait_handle *ctdb_lockwait(struct ctdb_db_context *ctdb_db,
 	if (result->child == 0) {
 		char c = 0;
 		close(result->fd[0]);
+		debug_extra = talloc_asprintf(NULL, "chainlock-%s:", ctdb_db->db_name);
 		tdb_chainlock(ctdb_db->ltdb->tdb, key);
 		write(result->fd[1], &c, 1);
 		/* make sure we die when our parent dies */
diff --git a/server/ctdb_logging.c b/server/ctdb_logging.c
index a7ca1a1..4e3595b 100644
--- a/server/ctdb_logging.c
+++ b/server/ctdb_logging.c
@@ -115,6 +115,7 @@ int start_syslog_daemon(struct ctdb_context *ctdb)
 		return 0;
 	}
 
+	debug_extra = talloc_asprintf(NULL, "syslogd:");
 	talloc_free(ctdb->ev);
 	ctdb->ev = event_context_init(NULL);
 
@@ -213,15 +214,16 @@ static void ctdb_syslog_log(const char *format, va_list ap)
 		break;		
 	}
 
-	len = offsetof(struct syslog_message, message) + strlen(s) + 1;
+	len = offsetof(struct syslog_message, message) + strlen(debug_extra) + strlen(s) + 1;
 	msg = malloc(len);
 	if (msg == NULL) {
 		free(s);
 		return;
 	}
 	msg->level = level;
-	msg->len   = strlen(s);
-	strcpy(msg->message, s);
+	msg->len   = strlen(debug_extra) + strlen(s);
+	strcpy(msg->message, debug_extra);
+	strcat(msg->message, s);
 
 	if (syslogd_is_started == 0) {
 		syslog(msg->level, "%s", msg->message);
@@ -275,8 +277,9 @@ static void ctdb_logfile_log(const char *format, va_list ap)
 
 	strftime(tbuf,sizeof(tbuf)-1,"%Y/%m/%d %H:%M:%S", tm);
 
-	ret = asprintf(&s2, "%s.%06u [%5u]: %s",
-		 tbuf, (unsigned)t.tv_usec, (unsigned)getpid(), s);
+	ret = asprintf(&s2, "%s.%06u [%s%5u]: %s",
+		       tbuf, (unsigned)t.tv_usec,
+		       debug_extra, (unsigned)getpid(), s);
 	free(s);
 	if (ret == -1) {
 		const char *errstr = "asprintf failed\n";
diff --git a/server/ctdb_monitor.c b/server/ctdb_monitor.c
index 729895c..cd1d5b9 100644
--- a/server/ctdb_monitor.c
+++ b/server/ctdb_monitor.c
@@ -91,6 +91,7 @@ static void ctdb_run_notification_script(struct ctdb_context *ctdb, const char *
 	if (child == 0) {
 		int ret;
 
+		debug_extra = talloc_asprintf(NULL, "notification-%s:", event);
 		ret = ctdb_run_notification_script_child(ctdb, event);
 		if (ret != 0) {
 			DEBUG(DEBUG_ERR,(__location__ " Notification script failed\n"));
diff --git a/server/ctdb_persistent.c b/server/ctdb_persistent.c
index d38aa8d..4401bcd 100644
--- a/server/ctdb_persistent.c
+++ b/server/ctdb_persistent.c
@@ -543,6 +543,7 @@ struct childwrite_handle *ctdb_childwrite(struct ctdb_db_context *ctdb_db,
 		char c = 0;
 
 		close(result->fd[0]);
+		debug_extra = talloc_asprintf(NULL, "childwrite-%s:", ctdb_db->db_name);
 		ret = ctdb_persistent_store(state);
 		if (ret != 0) {
 			DEBUG(DEBUG_ERR, (__location__ " Failed to write persistent data\n"));
diff --git a/server/ctdb_recover.c b/server/ctdb_recover.c
index 22e4898..f61b6e7 100644
--- a/server/ctdb_recover.c
+++ b/server/ctdb_recover.c
@@ -725,6 +725,7 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb,
 		char cc = 0;
 		close(state->fd[0]);
 
+		debug_extra = talloc_asprintf(NULL, "set_recmode:");
 		/* we should not be able to get the lock on the reclock file, 
 		  as it should  be held by the recovery master 
 		*/
diff --git a/server/ctdb_recoverd.c b/server/ctdb_recoverd.c
index dfef716..fbc5eec 100644
--- a/server/ctdb_recoverd.c
+++ b/server/ctdb_recoverd.c
@@ -2711,6 +2711,7 @@ static int check_recovery_lock(struct ctdb_context *ctdb)
 		close(state->fd[0]);
 		state->fd[0] = -1;
 
+		debug_extra = talloc_asprintf(NULL, "recovery-lock:");
 		if (pread(ctdb->recovery_lock_fd, &cc, 1, 0) == -1) {
 			DEBUG(DEBUG_CRIT,("failed read from recovery_lock_fd - %s\n", strerror(errno)));
 			cc = RECLOCK_FAILED;
@@ -3527,7 +3528,7 @@ int ctdb_start_recoverd(struct ctdb_context *ctdb)
 
 	srandom(getpid() ^ time(NULL));
 
-	if (switch_from_server_to_client(ctdb) != 0) {
+	if (switch_from_server_to_client(ctdb, "recoverd") != 0) {
 		DEBUG(DEBUG_CRIT, (__location__ "ERROR: failed to switch recovery daemon into client mode. shutting down.\n"));
 		exit(1);
 	}
diff --git a/server/ctdb_traverse.c b/server/ctdb_traverse.c
index 2c33387..6bfa1d4 100644
--- a/server/ctdb_traverse.c
+++ b/server/ctdb_traverse.c
@@ -171,6 +171,8 @@ static struct ctdb_traverse_local_handle *ctdb_traverse_local(struct ctdb_db_con
 	if (h->child == 0) {
 		/* start the traverse in the child */
 		close(h->fd[0]);
+		debug_extra = talloc_asprintf(NULL, "traverse_local-%s:",
+					      ctdb_db->db_name);
 		tdb_traverse_read(ctdb_db->ltdb->tdb, ctdb_traverse_local_fn, h);
 		_exit(0);
 	}
diff --git a/server/ctdb_vacuum.c b/server/ctdb_vacuum.c
index 580686e..72d8655 100644
--- a/server/ctdb_vacuum.c
+++ b/server/ctdb_vacuum.c
@@ -36,8 +36,12 @@
 enum vacuum_child_status { VACUUM_RUNNING, VACUUM_OK, VACUUM_ERROR, VACUUM_TIMEOUT};
 
 struct ctdb_vacuum_child_context {
+	struct ctdb_vacuum_child_context *next, *prev;
 	struct ctdb_vacuum_handle *vacuum_handle;
+	/* fd child writes status to */
 	int fd[2];
+	/* fd to abort vacuuming. */
+	int abortfd[2];
 	pid_t child_pid;
 	enum vacuum_child_status status;
 	struct timeval start_time;
@@ -65,6 +69,8 @@ struct vacuum_data {
 	uint32_t total;
 	uint32_t vacuumed;
 	uint32_t copied;
+	int abortfd;
+	bool abort;
 };
 
 /* tuning information stored for every db */
@@ -105,6 +111,15 @@ static int vacuum_traverse(struct tdb_context *tdb, TDB_DATA key, TDB_DATA data,
 	struct ctdb_ltdb_header *hdr;
 	struct ctdb_rec_data *rec;
 	size_t old_size;
+	char c;
+
+	/* Should we abort? */
+	if (read(vdata->abortfd, &c, 1) == 1) {
+		DEBUG(DEBUG_INFO, ("Abort during vacuum_traverse for %s\n",
+				   ctdb_db->db_name));
+		vdata->abort = true;
+		return -1;
+	}
 	       
 	lmaster = ctdb_lmaster(ctdb, &key);
 	if (lmaster >= ctdb->vnn_map->size) {
@@ -258,7 +273,10 @@ static int ctdb_vacuum_db(struct ctdb_db_context *ctdb_db, struct vacuum_data *v
 		DEBUG(DEBUG_ERR,(__location__ " Traverse error in vacuuming '%s'\n", name));
 		return -1;		
 	}
-
+	if (vdata->abort) {
+		DEBUG(DEBUG_INFO,("Traverse aborted vacuuming '%s'\n", name));
+		return -1;
+	}
 	for ( i = 0; i < ctdb->vnn_map->size; i++) {
 		if (vdata->list[i]->count == 0) {
 			continue;
@@ -317,12 +335,19 @@ static int ctdb_vacuum_db(struct ctdb_db_context *ctdb_db, struct vacuum_data *v
 		for (i = 0; i < ctdb->vnn_map->size; i++) {
 			struct ctdb_marshall_buffer *records;
 			struct ctdb_rec_data *rec;
+			char c;
 
 			if (ctdb->vnn_map->map[i] == ctdb->pnn) {
 				/* we dont delete the records on the local node just yet */
 				continue;


-- 
CTDB repository