Race condition in tdb_runtime_check_for_robust_mutexes()

Uri Simchoni uri at samba.org
Wed Mar 23 11:02:48 UTC 2016


Here's a patch to fix the issue. I don't have confirmation that it fixes 
the issue but would like to continue the discussion on:
a. Whether this is the right direction or should we eliminate the signal 
handler as Ralph suggested.
b. The use of sigprocmask without #ifdefs, as opposed to installing and 
uninstalling the signal handler - it's hard as it is to get those things 
right, and I wonder whether we target envs which don't support 
sigprocmask. fork() hasn't #ifdefs around it ... :)

Thanks,
Uri.

On 03/23/2016 10:56 AM, Uri Simchoni wrote:
> On 03/23/2016 10:30 AM, Ralph Boehme wrote:
>> On Wed, Mar 23, 2016 at 07:36:53AM +0200, Uri Simchoni wrote:
>>> OK I've figured out why we want the waitpid() in the signal handler - we
>>> want to catch the child terminating yet still support SIGCHLD
>>> handling of
>>> the enclosing process.
>>>
>>> Hopefully I'll submit a patch shortly.
>>
>> I just briefly looked over this, so I might be missing something, but
>> afaict we could block SIGCHLD, getting rid of our own signal handler
>> and thus getting rid of the race between two calls to waitpid().
>>
>> -Ralph
>>
> We have to make sure SIGCHLD is not set to SIG_IGN or we won't be able
> to waitpid().
>
> An alternative approach to today's signal handler might be:
> 1. block SIGCHLD
> 2. set SIGCHLD handler to SIG_DFL - this makes sure it's not SIG_IGN
> 3. fork, do handshake, waitpid()
> 4. restore signal handler
> 5. restore signal mask
>
> It's certainly less code, given that a fix that maintains the current
> signal handler also has to temporarily block SIGCHLD in order to avoid
> the race.
>
> Perhaps the intent of the existing signal handler is to leave absolutely
> no trace of this child process - if there is a pre-installed signal
> handler, after we unblock SIGCHLD it will see something that looks like
> a spurious SIGCHLD.
>
> Currently I'm testing a fix that leaves the signal handler as-is and
> hopefully fixes the issue.
>
> Thanks,
> Uri.
>

-------------- next part --------------
From 2a13235ca562c737633fdb69a8d93209d31a12bf Mon Sep 17 00:00:00 2001
From: Uri Simchoni <uri at samba.org>
Date: Wed, 23 Mar 2016 12:44:28 +0200
Subject: [PATCH] tdb: avoid a race condition when checking for robust mutexes

BUG: https://bugzilla.samba.org/show_bug.cgi?id=11808

Signed-off-by: Uri Simchoni <uri at samba.org>
---
 lib/tdb/common/mutex.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/lib/tdb/common/mutex.c b/lib/tdb/common/mutex.c
index fae43d4..7fff452 100644
--- a/lib/tdb/common/mutex.c
+++ b/lib/tdb/common/mutex.c
@@ -776,6 +776,8 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
 	char c = 0;
 	bool ok;
 	int status;
+	sigset_t block_sigmask, save_sigmask;
+	pid_t child_pid;
 	static bool initialized;
 
 	if (initialized) {
@@ -833,6 +835,17 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
 		goto cleanup_ma;
 	}
 
+	/* block SIGCHLD so that forking and assigning to
+	 * tdb_robust_mutex_pid is atomic
+	 */
+	sigemptyset(&block_sigmask);
+	if (sigaddset(&block_sigmask, SIGCHLD) < 0) {
+		goto cleanup_ma;
+	}
+	if (sigprocmask(SIG_BLOCK, &block_sigmask, &save_sigmask) < 0) {
+		goto cleanup_ma;
+	}
+
 	tdb_robust_mutex_pid = fork();
 	if (tdb_robust_mutex_pid == 0) {
 		size_t nwritten;
@@ -853,7 +866,14 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
 		/* leave locked */
 		_exit(0);
 	}
-	if (tdb_robust_mutex_pid == -1) {
+	child_pid = tdb_robust_mutex_pid;
+	ret = sigprocmask(SIG_SETMASK, &save_sigmask, NULL);
+	if (ret != 0) {
+		/* we can't keep running with SIGCHLD blocked */
+		abort();
+	}
+
+	if (child_pid == -1) {
 		goto cleanup_sig_child;
 	}
 	close(pipe_down[0]);
@@ -887,11 +907,18 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
 		pid_t pid;
 
 		errno = 0;
-		pid = waitpid(tdb_robust_mutex_pid, &status, 0);
-		if (pid == tdb_robust_mutex_pid) {
+		pid = waitpid(child_pid, &status, 0);
+		if (pid == child_pid) {
 			tdb_robust_mutex_pid = -1;
 			break;
 		}
+		/* if the signal handler collected the child
+		 * for us, waitpid() returns with ECHILD
+		 * and tdb_robust_mutex_pid is -1
+		 */
+		if (tdb_robust_mutex_pid == -1) {
+			break;
+		}
 		if (pid == -1 && errno != EINTR) {
 			goto cleanup_child;
 		}
-- 
2.5.0



More information about the samba-technical mailing list