Race condition in tdb_runtime_check_for_robust_mutexes()

Ralph Boehme slow at samba.org
Tue Mar 29 09:27:49 UTC 2016


On Mon, Mar 28, 2016 at 07:47:13PM +0200, Ralph Boehme wrote:
> On Mon, Mar 28, 2016 at 10:38:09AM -0700, Jeremy Allison wrote:
> > On Mon, Mar 28, 2016 at 09:28:34AM -0700, Jeremy Allison wrote:
> > > On Mon, Mar 28, 2016 at 09:23:48AM -0700, Jeremy Allison wrote:
> > > > On Sun, Mar 27, 2016 at 09:34:38AM +0200, Ralph Boehme wrote:
> > > > > 
> > > > > > but to be honest I'd much prefer it if you moved those calls outside
> > > > > > and back to just before/after the calls to
> > > > > > tdb_robust_mutex_setup_sigchild(), and leave that call as-is. That
> > > > > > way I can *see* they're there at the right point in the code.
> > > > > 
> > > > > attached. :)
> > > > 
> > > > Thanks Ralph, I can actually understand your code
> > > > changes now :-).
> > > > 
> > > > Pushed !
> > > 
> > > With one minor change to remove the now-unused
> > > 'status' variable :-).
> > 
> > Oh. I'm getting a reproducible fail on :
> > 
> > UNEXPECTED(error): samba3.smbtorture_s3.plain(nt4_dc).CLEANUP1.smbtorture(nt4_dc)
> > REASON: Exception: Exception: reason (failure) interrupted
> > 
> > command: /home/jeremy/src/samba/git/master/source3/script/tests/test_smbtorture_s3.sh CLEANUP1 //$SERVER_IP/tmp $USERNAME $PASSWORD /home/jeremy/src/samba/git/master/bin/smbtorture3  -l $LOCAL_PATH 2>&1  | /home/jeremy/src/samba/git/master/selftest/filter-subunit --fail-on-empty --prefix="samba3.smbtorture_s3.plain(nt4_dc).CLEANUP1." --suffix="(nt4_dc)"
> > expanded command: /home/jeremy/src/samba/git/master/source3/script/tests/test_smbtorture_s3.sh CLEANUP1 //127.0.0.3/tmp jeremy localntdc2pass /home/jeremy/src/samba/git/master/bin/smbtorture3  -l /home/jeremy/src/samba/git/master/st/nt4_dc/share 2>&1  | /home/jeremy/src/samba/git/master/selftest/filter-subunit --fail-on-empty --prefix="samba3.smbtorture_s3.plain(nt4_dc).CLEANUP1." --suffix="(nt4_dc)"
> > ERROR: Testsuite[samba3.smbtorture_s3.plain(nt4_dc).CLEANUP1]
> > REASON: Exit code was 1
> > 
> > with this patchset. Ralph, can you
> > take a look please ? I'm off to a
> > conference (lasts most of the week)
> > today so I won't be able to check
> > immediately what this breaks.
> 
> hm, I'll take a look.

ok, I forgot to update the goto hell, so when the test succeeded the
signal mask was not reset. Updated patchset attached.

Cheerio!
-slow
-------------- next part --------------
From 847daeacb35f4b125288192c00dbb1354840a44b Mon Sep 17 00:00:00 2001
From: Ralph Boehme <slow at samba.org>
Date: Sat, 26 Mar 2016 12:43:55 +0100
Subject: [PATCH] tdb: avoid a race condition when checking for robust mutexes

This fixes a race between calling waitpid() in two places (SIGCHLD the
signal handler and the rendezvous code when waiting for the child to
terminate), by

- blocking SIGCHLD before installing our signal handler

- in the rendezvous code call sigssuspend() which unblocks SIGCHLD and
  suspends the thread and waits for signal delivery

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11808

Signed-off-by: Ralph Boehme <slow at samba.org>
---
 lib/tdb/common/mutex.c | 59 +++++++++++++++++++++++++++-----------------------
 1 file changed, 32 insertions(+), 27 deletions(-)

diff --git a/lib/tdb/common/mutex.c b/lib/tdb/common/mutex.c
index fae43d4..e96b28d 100644
--- a/lib/tdb/common/mutex.c
+++ b/lib/tdb/common/mutex.c
@@ -777,6 +777,7 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
 	bool ok;
 	int status;
 	static bool initialized;
+	sigset_t mask, old_mask, suspend_mask;
 
 	if (initialized) {
 		return tdb_mutex_locking_cached;
@@ -828,9 +829,22 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
 		goto cleanup_ma;
 	}
 
+	/*
+	 * Block SIGCHLD so we can atomically wait for it later with
+	 * sigsuspend()
+	 */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGCHLD);
+	ret = pthread_sigmask(SIG_BLOCK, &mask, &old_mask);
+	if (ret != 0) {
+		goto cleanup_ma;
+	}
+	suspend_mask = old_mask;
+	sigdelset(&suspend_mask, SIGCHLD);
+
 	if (tdb_robust_mutex_setup_sigchild(tdb_robust_mutex_handler,
 			&tdb_robust_mutext_old_handler) == false) {
-		goto cleanup_ma;
+		goto cleanup_sigmask;
 	}
 
 	tdb_robust_mutex_pid = fork();
@@ -884,16 +898,9 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
 	}
 
 	while (tdb_robust_mutex_pid > 0) {
-		pid_t pid;
-
-		errno = 0;
-		pid = waitpid(tdb_robust_mutex_pid, &status, 0);
-		if (pid == tdb_robust_mutex_pid) {
-			tdb_robust_mutex_pid = -1;
-			break;
-		}
-		if (pid == -1 && errno != EINTR) {
-			goto cleanup_child;
+		ret = sigsuspend(&suspend_mask);
+		if (ret != -1 || errno != EINTR) {
+			abort();
 		}
 	}
 	tdb_robust_mutex_setup_sigchild(tdb_robust_mutext_old_handler, NULL);
@@ -903,46 +910,44 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
 		if (ret == 0) {
 			pthread_mutex_unlock(m);
 		}
-		goto cleanup_m;
+		goto cleanup_sigmask;
 	}
 
 	ret = pthread_mutex_consistent(m);
 	if (ret != 0) {
-		goto cleanup_m;
+		goto cleanup_sigmask;
 	}
 
 	ret = pthread_mutex_trylock(m);
 	if (ret != EDEADLK) {
 		pthread_mutex_unlock(m);
-		goto cleanup_m;
+		goto cleanup_sigmask;
 	}
 
 	ret = pthread_mutex_unlock(m);
 	if (ret != 0) {
-		goto cleanup_m;
+		goto cleanup_sigmask;
 	}
 
 	tdb_mutex_locking_cached = true;
-	goto cleanup_m;
+	goto cleanup_sigmask;
 
 cleanup_child:
 	while (tdb_robust_mutex_pid > 0) {
-		pid_t pid;
-
 		kill(tdb_robust_mutex_pid, SIGKILL);
-
-		errno = 0;
-		pid = waitpid(tdb_robust_mutex_pid, &status, 0);
-		if (pid == tdb_robust_mutex_pid) {
-			tdb_robust_mutex_pid = -1;
-			break;
-		}
-		if (pid == -1 && errno != EINTR) {
-			break;
+		ret = sigsuspend(&suspend_mask);
+		if (ret != -1 || errno != EINTR) {
+			abort();
 		}
 	}
+
 cleanup_sig_child:
 	tdb_robust_mutex_setup_sigchild(tdb_robust_mutext_old_handler, NULL);
+cleanup_sigmask:
+	ret = pthread_sigmask(SIG_SETMASK, &old_mask, NULL);
+	if (ret != 0) {
+		abort();
+	}
 cleanup_m:
 	pthread_mutex_destroy(m);
 cleanup_ma:
-- 
2.5.0



More information about the samba-technical mailing list