[SCM] CTDB repository - branch master updated - ctdb-2.3-41-g5e9b1a7

Sun Aug 11 23:47:15 MDT 2013

The branch, master has been updated
       via  5e9b1a7e24d058ff88aaa0563db36a804e866fa9 (commit)
       via  867afb247bd8cc86c8d738f051a44cc534cafacf (commit)
       via  44a64d1c388bfe3c3388b191edfaedecfb7bb831 (commit)
       via  9cde47e1a5bf1b9ca3b4da8c2db94caac2b1aa5e (commit)
       via  81d7ce03b28d592a1337639e14d9ea141e20bfff (commit)
       via  d7f6bc3fed2dc61e6e587b4c0ec0ac27d533bbbe (commit)
       via  9e99e0eb072e2b845914ee3896acbc66b96138d7 (commit)
       via  44eb86e6042adb6efe75d2a5528b82a0f21d496d (commit)
       via  ebecc3a18f1cb397a78b56eaf8f752dd5495bcc9 (commit)
       via  68af5405acc123b5a90decd2123e2a02961a8fcf (commit)
      from  824dcec35ec461d78e22b2ea109473b32bfe3972 (commit)

http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 5e9b1a7e24d058ff88aaa0563db36a804e866fa9
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Aug 5 17:28:47 2013 +1000

    common/io: Keep queue buffer size multiple of 4K
    
    Currently queue buffer size is realloc'd every time we need to extend the
    buffer.  Small increments can cause memory fragmentation.  Instead always
    extend buffer in multiples of 4K.  This should reduce multiple talloc_realloc
    calls when there are lots of packets in the socket buffer.
    
    Also, if queue buffer has grown larger than 64K, throw away the buffer once
    all the requests in the queue have been processed.  That way queue does not
    hold on to large buffers.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>

commit 867afb247bd8cc86c8d738f051a44cc534cafacf
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 26 13:57:03 2013 +1000

    packaging: Allow setting custom release number in RPM spec file
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Pair-Programmed-With: Amitay Isaacs <amitay at gmail.com>

commit 44a64d1c388bfe3c3388b191edfaedecfb7bb831
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Wed Jul 31 15:59:11 2013 +1000

    ctdbd: When a record is made sticky, log only once
    
    Instead of logging from ctdb_request_call(), log the message from
    ctdb_make_record_sticky().  That way if the record is already sticky, the
    message is not repeated unnecessarily.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>

commit 9cde47e1a5bf1b9ca3b4da8c2db94caac2b1aa5e
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Jul 15 17:34:31 2013 +1000

    ctdbd: Improve high hopcount log messages when request is redirected
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>

commit 81d7ce03b28d592a1337639e14d9ea141e20bfff
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Aug 6 16:11:40 2013 +1000

    scripts: Do not run ctdb tool commands when debugging hung "init" event
    
    CTDB daemon is not ready to accept clients in INIT runstate (init event).
    CTDB daemon will start accepting connections in SETUP runstate (setup event)
    and later.
    
    Also, minor log formatting changes.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>

commit d7f6bc3fed2dc61e6e587b4c0ec0ac27d533bbbe
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Aug 5 17:38:42 2013 +1000

    ctdbd: Avoid leaking file descriptor if talloc fails
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>

commit 9e99e0eb072e2b845914ee3896acbc66b96138d7
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Mon Aug 5 14:08:28 2013 +1000

    eventscript: Wait for debug hung script to finish or timeout before continuing
    
    Currently if the debug hung script takes long time to finish, the subsequent
    monitor event can collide with the previous event which is not yet finished.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>

commit 44eb86e6042adb6efe75d2a5528b82a0f21d496d
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Fri Aug 2 15:49:06 2013 +1000

    eventscripts: Use configured RECLOCK file instead of asking CTDB
    
    On cluster where recovery lock file is not being used, asking CTDB daemon
    is unnecessary overhead.  And if CTDB is using recovery file, then changing
    configuration without restarting is *stupid*.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>
    Pair-Programmed-With: Martin Schwenke <martin at meltin.net>

commit ebecc3a18f1cb397a78b56eaf8f752dd5495bcc9
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Fri Aug 2 10:54:38 2013 +1000

    locking: Do not create multiple lock processes for the same key
    
    If there are multiple lock helper processes waiting for the same record, then
    it will cause a thundering herd when that record has been unlocked.  So avoid
    scheduling lock contexts for the same record.  This will also mean that
    multiple requests will get queued up behind the same lock context and can be
    processed quickly once the lock has been obtained.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>

commit 68af5405acc123b5a90decd2123e2a02961a8fcf
Author: Amitay Isaacs <amitay at gmail.com>
Date:   Fri Aug 2 10:51:45 2013 +1000

    locking: Move function find_lock_context() before ctdb_lock_schedule()
    
    So that ctdb_lock_schedule() can call this function without requiring extra
    prototype declaration.
    
    Signed-off-by: Amitay Isaacs <amitay at gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 common/ctdb_io.c            |   28 ++++++++--
 config/debug-hung-script.sh |    8 ++-
 config/events.d/01.reclock  |   12 ++---
 packaging/RPM/ctdb.spec.in  |    2 +-
 packaging/RPM/makerpms.sh   |    9 ++-
 packaging/maketarball.sh    |    9 ++-
 packaging/mkversion.sh      |    4 +-
 server/ctdb_call.c          |   15 +++---
 server/ctdb_control.c       |    5 ++-
 server/ctdb_lock.c          |  127 +++++++++++++++++++++++--------------------
 server/eventscript.c        |   72 ++++++++++++++++++++-----
 11 files changed, 188 insertions(+), 103 deletions(-)


Changeset truncated at 500 lines:

diff --git a/common/ctdb_io.c b/common/ctdb_io.c
index aee8864..1db6f2b 100644
--- a/common/ctdb_io.c
+++ b/common/ctdb_io.c
@@ -29,6 +29,9 @@
 #include "../include/ctdb_client.h"
 #include <stdarg.h>
 
+#define QUEUE_BUFFER_SIZE	(4*1024)
+#define QUEUE_BUFFER_OVERSIZE	(64*1024)
+
 /* structures for packet queueing - see common/ctdb_io.c */
 struct ctdb_buffer {
 	uint8_t *data;
@@ -122,6 +125,12 @@ static void queue_process(struct ctdb_queue *queue)
 		/* There is more data to be processed, schedule an event */
 		tevent_schedule_immediate(queue->im, queue->ctdb->ev,
 					  queue_process_event, queue);
+	} else {
+		/* Throw away large buffer when done processing requests */
+		if (queue->buffer.size > QUEUE_BUFFER_OVERSIZE) {
+			TALLOC_FREE(queue->buffer.data);
+			queue->buffer.size = 0;
+		}
 	}
 
 	/* It is the responsibility of the callback to free 'data' */
@@ -159,22 +168,29 @@ static void queue_io_read(struct ctdb_queue *queue)
 	}
 
 	if (queue->buffer.data == NULL) {
+		int n;
+
 		/* starting fresh, allocate buf to read data */
-		queue->buffer.data = talloc_size(queue, num_ready);
+		n = QUEUE_BUFFER_SIZE * (num_ready/QUEUE_BUFFER_SIZE + 1);
+		queue->buffer.data = talloc_size(queue, n);
 		if (queue->buffer.data == NULL) {
-			DEBUG(DEBUG_ERR, ("read error alloc failed for %u\n", num_ready));
+			DEBUG(DEBUG_ERR, ("read error alloc failed for %u\n", n));
 			goto failed;
 		}
-		queue->buffer.size = num_ready;
+		queue->buffer.size = n;
 	} else if (queue->buffer.length + num_ready > queue->buffer.size) {
+		int increment, n;
+
 		/* extending buffer */
-		data = talloc_realloc_size(queue, queue->buffer.data, queue->buffer.length + num_ready);
+		increment = (queue->buffer.length + num_ready) - queue->buffer.size;
+		n = queue->buffer.size + QUEUE_BUFFER_SIZE * (increment/QUEUE_BUFFER_SIZE + 1);
+		data = talloc_realloc_size(queue, queue->buffer.data, n);
 		if (data == NULL) {
-			DEBUG(DEBUG_ERR, ("read error realloc failed for %u\n", queue->buffer.length + num_ready));
+			DEBUG(DEBUG_ERR, ("read error realloc failed for %u\n", n));
 			goto failed;
 		}
 		queue->buffer.data = data;
-		queue->buffer.size = queue->buffer.length + num_ready;
+		queue->buffer.size = n;
 	}
 
 	nread = read(queue->fd, queue->buffer.data + queue->buffer.length, num_ready);
diff --git a/config/debug-hung-script.sh b/config/debug-hung-script.sh
index 32dbd5f..1984242 100755
--- a/config/debug-hung-script.sh
+++ b/config/debug-hung-script.sh
@@ -3,17 +3,21 @@
 (
     flock --wait 2 9 || exit 1
 
-    echo "===== Start of hung script debug for PID=\"$1\", event\"$2\" ====="
+    echo "===== Start of hung script debug for PID=\"$1\", event=\"$2\" ====="
 
     echo "pstree -p -a ${1}:"
     pstree -p -a $1
 
+    if [ "$2" = "init" ] ; then
+	exit 0
+    fi
+
     echo "ctdb scriptstatus ${2}:"
     # No use running several of these in parallel if, say, "releaseip"
     # event hangs for multiple IPs.  In that case the output would be
     # interleaved in the log and would just be confusing.
     ctdb scriptstatus "$2"
 
-    echo "===== End of hung script debug for PID=\"$1\", event\"$2\" ====="
+    echo "===== End of hung script debug for PID=\"$1\", event=\"$2\" ====="
 
 ) 9>"${CTDB_VARDIR}/debug-hung-script.lock"
diff --git a/config/events.d/01.reclock b/config/events.d/01.reclock
index ec50989..ed7afdd 100755
--- a/config/events.d/01.reclock
+++ b/config/events.d/01.reclock
@@ -7,21 +7,19 @@
 . $CTDB_BASE/functions
 loadconfig
 
-case "$1" in 
+case "$1" in
     init)
 	ctdb_counter_init
 	;;
-    
-    monitor)
-	RECLOCKFILE=$(ctdb -Y getreclock)
 
+    monitor)
 	# Early exit if not using a reclock file
-	[ -n "$RECLOCKFILE" ] || exit 0
+	[ -n "$CTDB_RECOVERY_LOCK" ] || exit 0
 
 	# Try to stat the reclock file as a background process so that
 	# we don't block in case the cluster filesystem is unavailable
 	(
-	    if stat $RECLOCKFILE ; then
+	    if stat $CTDB_RECOVERY_LOCK ; then
 		# We could stat the file, reset the counter
 		ctdb_counter_init
 	    fi
@@ -29,7 +27,7 @@ case "$1" in
 
 	ctdb_counter_incr
 	if ! ctdb_check_counter "quiet" -ge 200 ; then
-	    echo "Reclock file \"$RECLOCKFILE\" can not be accessed. Shutting down."
+	    echo "Reclock file \"$CTDB_RECOVERY_LOCK\" can not be accessed. Shutting down."
 	    df
 	    sleep 1
 	    ctdb shutdown
diff --git a/packaging/RPM/ctdb.spec.in b/packaging/RPM/ctdb.spec.in
index 62fc65f..71cf0a8 100644
--- a/packaging/RPM/ctdb.spec.in
+++ b/packaging/RPM/ctdb.spec.in
@@ -5,7 +5,7 @@ Summary: Clustered TDB
 Vendor: Samba Team
 Packager: Samba Team <samba at samba.org>
 Version: @VERSION@
-Release: 1
+Release: @RELEASE@
 Epoch: 0
 License: GNU GPL version 3
 Group: System Environment/Daemons
diff --git a/packaging/RPM/makerpms.sh b/packaging/RPM/makerpms.sh
index 8dbec55..9b4f139 100755
--- a/packaging/RPM/makerpms.sh
+++ b/packaging/RPM/makerpms.sh
@@ -52,12 +52,15 @@ mkdir -p `rpm --eval %_rpmdir`/noarch
 mkdir -p `rpm --eval %_rpmdir`/i386
 mkdir -p `rpm --eval %_rpmdir`/x86_64
 
-VERSION=$(${TOPDIR}/packaging/mkversion.sh ${TOPDIR}/include/ctdb_version.h)
-if [ -z "$VERSION" ]; then
+set -- $(${TOPDIR}/packaging/mkversion.sh ${TOPDIR}/include/ctdb_version.h)
+VERSION=$1
+RELEASE=$2
+if [ -z "$VERSION" -o -z "$RELEASE" ]; then
     exit 1
 fi
 
-sed -e s/@VERSION@/$VERSION/g \
+sed -e "s/@VERSION@/$VERSION/g" \
+    -e "s/@RELEASE@/$RELEASE/g" \
 	< ${DIRNAME}/${SPECFILE_IN} \
 	> ${DIRNAME}/${SPECFILE}
 
diff --git a/packaging/maketarball.sh b/packaging/maketarball.sh
index be19869..c99bb70 100755
--- a/packaging/maketarball.sh
+++ b/packaging/maketarball.sh
@@ -53,12 +53,15 @@ if [ $RC -ne 0 ]; then
 	exit 1
 fi
 
-VERSION=$(${TOPDIR}/packaging/mkversion.sh ${VERSION_H})
-if [ -z "$VERSION" ]; then
+set -- $(${TOPDIR}/packaging/mkversion.sh ${VERSION_H})
+VERSION=$1
+RELEASE=$2
+if [ -z "$VERSION" -o -z "$RELEASE" ]; then
     exit 1
 fi
 
-sed -e s/@VERSION@/${VERSION}/g \
+sed -e "s/@VERSION@/${VERSION}/g" \
+    -e "s/@RELEASE@/$RELEASE/g" \
 	< ${SPECFILE_IN} \
 	> ${SPECFILE}
 
diff --git a/packaging/mkversion.sh b/packaging/mkversion.sh
index 4a80b25..7a550a5 100755
--- a/packaging/mkversion.sh
+++ b/packaging/mkversion.sh
@@ -42,10 +42,12 @@ case "$TAG" in
 	    *-*-g*) # 0.9-168-ge6cf0e8
 		# Not exactly on tag: devel version.
 		VERSION=`echo "$TAG" | sed 's/\([^-]\+\)-\([0-9]\+\)-\(g[0-9a-f]\+\)/\1.0.\2.\3.devel/'`
+		RELEASE=1
 		;;
 	    *)
 		# An actual release version
 		VERSION=$TAG
+		RELEASE=1
 		;;
 	esac
 	;;
@@ -61,4 +63,4 @@ cat > "$OUTPUT" <<EOF
 
 EOF
 
-echo $VERSION
+echo "$VERSION $RELEASE"
diff --git a/server/ctdb_call.c b/server/ctdb_call.c
index 87209fd..6288ff2 100644
--- a/server/ctdb_call.c
+++ b/server/ctdb_call.c
@@ -131,12 +131,12 @@ static void ctdb_call_send_redirect(struct ctdb_context *ctdb,
 	}
 	c->hopcount++;
 
-	if (c->hopcount%100 == 99) {
-		DEBUG(DEBUG_WARNING,("High hopcount %d dbid:0x%08x "
-			"key:0x%08x pnn:%d src:%d lmaster:%d "
+	if (c->hopcount%100 > 95) {
+		DEBUG(DEBUG_WARNING,("High hopcount %d dbid:%s "
+			"key:0x%08x reqid=%08x pnn:%d src:%d lmaster:%d "
 			"header->dmaster:%d dst:%d\n",
-			c->hopcount, ctdb_db->db_id, ctdb_hash(&key),
-			ctdb->pnn, c->hdr.srcnode, lmaster,
+			c->hopcount, ctdb_db->db_name, ctdb_hash(&key),
+			c->hdr.reqid, ctdb->pnn, c->hdr.srcnode, lmaster,
 			header->dmaster, c->hdr.destnode));
 	}
 
@@ -564,7 +564,9 @@ ctdb_make_record_sticky(struct ctdb_context *ctdb, struct ctdb_db_context *ctdb_
 	sr->ctdb_db = ctdb_db;
 	sr->pindown = NULL;
 
-	DEBUG(DEBUG_ERR,("Make record sticky in db %s\n", ctdb_db->db_name));
+	DEBUG(DEBUG_ERR,("Make record sticky for %d seconds in db %s key:0x%08x.\n",
+			 ctdb->tunable.sticky_duration,
+			 ctdb_db->db_name, ctdb_hash(&key)));
 
 	trbt_insertarray32_callback(ctdb_db->sticky_records, k[0], &k[0], ctdb_make_sticky_record_callback, sr);
 
@@ -922,7 +924,6 @@ void ctdb_request_call(struct ctdb_context *ctdb, struct ctdb_req_header *hdr)
 	   should make it sticky.
 	*/
 	if (ctdb_db->sticky && c->hopcount >= ctdb->tunable.hopcount_make_sticky) {
-		DEBUG(DEBUG_ERR, ("Hot record in database %s. Hopcount is %d. Make record sticky for %d seconds\n", ctdb_db->db_name, c->hopcount, ctdb->tunable.sticky_duration));
 		ctdb_make_record_sticky(ctdb, ctdb_db, call->key);
 	}
 
diff --git a/server/ctdb_control.c b/server/ctdb_control.c
index a8771f3..cd96e82 100644
--- a/server/ctdb_control.c
+++ b/server/ctdb_control.c
@@ -52,7 +52,10 @@ int32_t ctdb_dump_memory(struct ctdb_context *ctdb, TDB_DATA *outdata)
 	fsize = ftell(f);
 	rewind(f);
 	outdata->dptr = talloc_size(outdata, fsize);
-	CTDB_NO_MEMORY(ctdb, outdata->dptr);
+	if (outdata->dptr == NULL) {
+		fclose(f);
+		CTDB_NO_MEMORY(ctdb, outdata->dptr);
+	}
 	outdata->dsize = fread(outdata->dptr, 1, fsize, f);
 	fclose(f);
 	if (outdata->dsize != fsize) {
diff --git a/server/ctdb_lock.c b/server/ctdb_lock.c
index 8886ed0..1d27a44 100644
--- a/server/ctdb_lock.c
+++ b/server/ctdb_lock.c
@@ -649,12 +649,65 @@ static char **lock_helper_args(TALLOC_CTX *mem_ctx, struct lock_context *lock_ct
 
 
 /*
+ * Find the lock context of a given type
+ */
+static struct lock_context *find_lock_context(struct lock_context *lock_list,
+					      struct ctdb_db_context *ctdb_db,
+					      TDB_DATA key,
+					      uint32_t priority,
+					      enum lock_type type)
+{
+	struct lock_context *lock_ctx;
+
+	/* Search active locks */
+	for (lock_ctx=lock_list; lock_ctx; lock_ctx=lock_ctx->next) {
+		if (lock_ctx->type != type) {
+			continue;
+		}
+
+		switch (lock_ctx->type) {
+		case LOCK_RECORD:
+			if (ctdb_db == lock_ctx->ctdb_db &&
+			    key.dsize == lock_ctx->key.dsize &&
+			    memcmp(key.dptr, lock_ctx->key.dptr, key.dsize) == 0) {
+				goto done;
+			}
+			break;
+
+		case LOCK_DB:
+			if (ctdb_db == lock_ctx->ctdb_db) {
+				goto done;
+			}
+			break;
+
+		case LOCK_ALLDB_PRIO:
+			if (priority == lock_ctx->priority) {
+				goto done;
+			}
+			break;
+
+		case LOCK_ALLDB:
+			goto done;
+			break;
+		}
+	}
+
+	/* Did not find the lock context we are searching for */
+	lock_ctx = NULL;
+
+done:
+	return lock_ctx;
+
+}
+
+
+/*
  * Schedule a new lock child process
  * Set up callback handler and timeout handler
  */
 static void ctdb_lock_schedule(struct ctdb_context *ctdb)
 {
-	struct lock_context *lock_ctx, *next_ctx;
+	struct lock_context *lock_ctx, *next_ctx, *active_ctx;
 	int ret;
 	TALLOC_CTX *tmp_ctx;
 	const char *helper = BINDIR "/ctdb_lock_helper";
@@ -684,8 +737,8 @@ static void ctdb_lock_schedule(struct ctdb_context *ctdb)
 	/* Find a lock context with requests */
 	lock_ctx = ctdb->lock_pending;
 	while (lock_ctx != NULL) {
+		next_ctx = lock_ctx->next;
 		if (! lock_ctx->req_queue) {
-			next_ctx = lock_ctx->next;
 			DEBUG(DEBUG_INFO, ("Removing lock context without lock requests\n"));
 			DLIST_REMOVE(ctdb->lock_pending, lock_ctx);
 			ctdb->lock_num_pending--;
@@ -694,12 +747,21 @@ static void ctdb_lock_schedule(struct ctdb_context *ctdb)
 				CTDB_DECREMENT_DB_STAT(lock_ctx->ctdb_db, locks.num_pending);
 			}
 			talloc_free(lock_ctx);
-			lock_ctx = next_ctx;
-			continue;
 		} else {
-			/* Found a lock context with lock requests */
-			break;
+			active_ctx = find_lock_context(ctdb->lock_current, lock_ctx->ctdb_db,
+						       lock_ctx->key, lock_ctx->priority,
+						       lock_ctx->type);
+			if (active_ctx == NULL) {
+				/* Found a lock context with lock requests */
+				break;
+			}
+
+			/* There is already a child waiting for the
+			 * same key.  So don't schedule another child
+			 * just yet.
+			 */
 		}
+		lock_ctx = next_ctx;
 	}
 
 	if (lock_ctx == NULL) {
@@ -802,59 +864,6 @@ static void ctdb_lock_schedule(struct ctdb_context *ctdb)
 
 
 /*
- * Find the lock context of a given type
- */
-static struct lock_context *find_lock_context(struct lock_context *lock_list,
-					      struct ctdb_db_context *ctdb_db,
-					      TDB_DATA key,
-					      uint32_t priority,
-					      enum lock_type type)
-{
-	struct lock_context *lock_ctx;
-
-	/* Search active locks */
-	for (lock_ctx=lock_list; lock_ctx; lock_ctx=lock_ctx->next) {
-		if (lock_ctx->type != type) {
-			continue;
-		}
-
-		switch (lock_ctx->type) {
-		case LOCK_RECORD:
-			if (ctdb_db == lock_ctx->ctdb_db &&
-			    key.dsize == lock_ctx->key.dsize &&
-			    memcmp(key.dptr, lock_ctx->key.dptr, key.dsize) == 0) {
-				goto done;
-			}
-			break;
-
-		case LOCK_DB:
-			if (ctdb_db == lock_ctx->ctdb_db) {
-				goto done;
-			}
-			break;
-
-		case LOCK_ALLDB_PRIO:
-			if (priority == lock_ctx->priority) {
-				goto done;
-			}
-			break;
-
-		case LOCK_ALLDB:
-			goto done;
-			break;
-		}
-	}
-
-	/* Did not find the lock context we are searching for */
-	lock_ctx = NULL;
-
-done:
-	return lock_ctx;
-
-}
-
-
-/*
  * Lock record / db depending on type
  */
 static struct lock_request *ctdb_lock_internal(struct ctdb_context *ctdb,
diff --git a/server/eventscript.c b/server/eventscript.c
index c255e17..640b68a 100644
--- a/server/eventscript.c
+++ b/server/eventscript.c
@@ -513,32 +513,60 @@ static void ctdb_event_script_handler(struct event_context *ev, struct fd_event
 	}
 }
 
-static void ctdb_run_debug_hung_script(struct ctdb_context *ctdb, struct ctdb_event_script_state *state)
+static void debug_hung_script_timeout(struct tevent_context *ev, struct tevent_timer *te,
+				      struct timeval t, void *p)
+{
+	struct ctdb_event_script_state *state =
+		talloc_get_type(p, struct ctdb_event_script_state);
+
+	talloc_free(state);
+}
+
+static void debug_hung_script_done(struct tevent_context *ev, struct tevent_fd *fde,
+				   uint16_t flags, void *p)
+{
+	struct ctdb_event_script_state *state =
+		talloc_get_type(p, struct ctdb_event_script_state);
+
+	talloc_free(state);
+}
+
+static int ctdb_run_debug_hung_script(struct ctdb_context *ctdb, struct ctdb_event_script_state *state)
 {
 	struct ctdb_script_wire *current = get_current_script(state);
 	char *cmd;
 	pid_t pid;
 	const char * debug_hung_script = ETCDIR "/ctdb/debug-hung-script.sh";
+	int fd[2];
+	struct tevent_timer *ttimer;
+	struct tevent_fd *tfd;
 
 	cmd = child_command_string(ctdb, state,
 				   state->from_user, current->name,
 				   state->call, state->options);
-	CTDB_NO_MEMORY_VOID(state->ctdb, cmd);
+	CTDB_NO_MEMORY(state->ctdb, cmd);
 
 	DEBUG(DEBUG_ERR,("Timed out running script '%s' after %.1f seconds pid :%d\n",
 			 cmd, timeval_elapsed(&current->start), state->child));
 	talloc_free(cmd);
 
+	if (pipe(fd) < 0) {
+		DEBUG(DEBUG_ERR,("Failed to create pipe fd for debug hung script\n"));
+		return -1;
+	}
+
 	if (!ctdb_fork_with_logging(ctdb, ctdb, "Hung script", NULL, NULL, &pid)) {
 		DEBUG(DEBUG_ERR,("Failed to fork a child process with logging to track hung event script\n"));
-		ctdb_kill(state->ctdb, state->child, SIGTERM);
-		return;
+		close(fd[0]);
+		close(fd[1]);
+		return -1;
 	}
 	if (pid == -1) {
 		DEBUG(DEBUG_ERR,("Fork for debug script failed : %s\n",
 				 strerror(errno)));
-		ctdb_kill(state->ctdb, state->child, SIGTERM);


-- 
CTDB repository