Race condition in tdb_runtime_check_for_robust_mutexes()
Uri Simchoni
uri at samba.org
Wed Mar 23 11:02:48 UTC 2016
Here's a patch to fix the issue. I don't have confirmation that it fixes
the issue but would like to continue the discussion on:
a. Whether this is the right direction or should we eliminate the signal
handler as Ralph suggested.
b. The use of sigprocmask without #ifdefs, as opposed to installing and
uninstalling the signal handler - it's hard as it is to get those things
right, and I wonder whether we target envs which don't support
sigprocmask. fork() hasn't #ifdefs around it ... :)
Thanks,
Uri.
On 03/23/2016 10:56 AM, Uri Simchoni wrote:
> On 03/23/2016 10:30 AM, Ralph Boehme wrote:
>> On Wed, Mar 23, 2016 at 07:36:53AM +0200, Uri Simchoni wrote:
>>> OK I've figured out why we want the waitpid() in the signal handler - we
>>> want to catch the child terminating yet still support SIGCHLD
>>> handling of
>>> the enclosing process.
>>>
>>> Hopefully I'll submit a patch shortly.
>>
>> I just briefly looked over this, so I might be missing something, but
>> afaict we could block SIGCHLD, getting rid of our own signal handler
>> and thus getting rid of the race between two calls to waitpid().
>>
>> -Ralph
>>
> We have to make sure SIGCHLD is not set to SIG_IGN or we won't be able
> to waitpid().
>
> An alternative approach to today's signal handler might be:
> 1. block SIGCHLD
> 2. set SIGCHLD handler to SIG_DFL - this makes sure it's not SIG_IGN
> 3. fork, do handshake, waitpid()
> 4. restore signal handler
> 5. restore signal mask
>
> It's certainly less code, given that a fix that maintains the current
> signal handler also has to temporarily block SIGCHLD in order to avoid
> the race.
>
> Perhaps the intent of the existing signal handler is to leave absolutely
> no trace of this child process - if there is a pre-installed signal
> handler, after we unblock SIGCHLD it will see something that looks like
> a spurious SIGCHLD.
>
> Currently I'm testing a fix that leaves the signal handler as-is and
> hopefully fixes the issue.
>
> Thanks,
> Uri.
>
-------------- next part --------------
From 2a13235ca562c737633fdb69a8d93209d31a12bf Mon Sep 17 00:00:00 2001
From: Uri Simchoni <uri at samba.org>
Date: Wed, 23 Mar 2016 12:44:28 +0200
Subject: [PATCH] tdb: avoid a race condition when checking for robust mutexes
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11808
Signed-off-by: Uri Simchoni <uri at samba.org>
---
lib/tdb/common/mutex.c | 33 ++++++++++++++++++++++++++++++---
1 file changed, 30 insertions(+), 3 deletions(-)
diff --git a/lib/tdb/common/mutex.c b/lib/tdb/common/mutex.c
index fae43d4..7fff452 100644
--- a/lib/tdb/common/mutex.c
+++ b/lib/tdb/common/mutex.c
@@ -776,6 +776,8 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
char c = 0;
bool ok;
int status;
+ sigset_t block_sigmask, save_sigmask;
+ pid_t child_pid;
static bool initialized;
if (initialized) {
@@ -833,6 +835,17 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
goto cleanup_ma;
}
+ /* block SIGCHLD so that forking and assigning to
+ * tdb_robust_mutex_pid is atomic
+ */
+ sigemptyset(&block_sigmask);
+ if (sigaddset(&block_sigmask, SIGCHLD) < 0) {
+ goto cleanup_ma;
+ }
+ if (sigprocmask(SIG_BLOCK, &block_sigmask, &save_sigmask) < 0) {
+ goto cleanup_ma;
+ }
+
tdb_robust_mutex_pid = fork();
if (tdb_robust_mutex_pid == 0) {
size_t nwritten;
@@ -853,7 +866,14 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
/* leave locked */
_exit(0);
}
- if (tdb_robust_mutex_pid == -1) {
+ child_pid = tdb_robust_mutex_pid;
+ ret = sigprocmask(SIG_SETMASK, &save_sigmask, NULL);
+ if (ret != 0) {
+ /* we can't keep running with SIGCHLD blocked */
+ abort();
+ }
+
+ if (child_pid == -1) {
goto cleanup_sig_child;
}
close(pipe_down[0]);
@@ -887,11 +907,18 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
pid_t pid;
errno = 0;
- pid = waitpid(tdb_robust_mutex_pid, &status, 0);
- if (pid == tdb_robust_mutex_pid) {
+ pid = waitpid(child_pid, &status, 0);
+ if (pid == child_pid) {
tdb_robust_mutex_pid = -1;
break;
}
+ /* if the signal handler collected the child
+ * for us, waitpid() returns with ECHILD
+ * and tdb_robust_mutex_pid is -1
+ */
+ if (tdb_robust_mutex_pid == -1) {
+ break;
+ }
if (pid == -1 && errno != EINTR) {
goto cleanup_child;
}
--
2.5.0
More information about the samba-technical
mailing list