Race condition in tdb_runtime_check_for_robust_mutexes()

Jeremy Allison jra at samba.org
Sun Mar 27 04:59:24 UTC 2016


On Sat, Mar 26, 2016 at 01:06:39PM +0100, Ralph Boehme wrote:
> On Fri, Mar 25, 2016 at 11:50:15AM -0700, Jeremy Allison wrote:
> > On Wed, Mar 23, 2016 at 01:02:48PM +0200, Uri Simchoni wrote:
> > > Here's a patch to fix the issue. I don't have confirmation that it
> > > fixes the issue but would like to continue the discussion on:
> > > a. Whether this is the right direction or should we eliminate the
> > > signal handler as Ralph suggested.
> > > b. The use of sigprocmask without #ifdefs, as opposed to installing
> > > and uninstalling the signal handler - it's hard as it is to get
> > > those things right, and I wonder whether we target envs which don't
> > > support sigprocmask. fork() hasn't #ifdefs around it ... :)
> > 
> > OK, I like this patch as clearly we need to make
> > the assignment to tdb_robust_mutex_pid atomic.
> > 
> > +1 from me.
> 
> Looks generally ok. One issue: calling sigprocmask() in a
> mutlithreaded program results in undefined behaviour per the spec, so
> we should use pthread_sigmask(). We're using pthread* stuff anyway, so
> we know it will be available.
> 
> Attached is something that should work as well that uses
> sigsuspend(). Feel free to push whatever you prefer. :)

Is sigsuspend real in glibc ? I remember the implementation
of pselect used to be something like:

    sigprocmask(SIG_SETMASK, &sigmask, &sigsaved);
    ready = select(nfds, &readfds, &writefds, &exceptfds, timeout);
    sigprocmask(SIG_SETMASK, &sigsaved, NULL);

which still has the race condition :-).

Also, the use of sigsuspend here isn't idiomatic,
which makes me nervous.

In the glibc manual we have:

http://www.gnu.org/software/libc/manual/html_node/Sigsuspend.html#Sigsuspend

The idiomatic use of sigsuspend is:

	/* Set up the mask of signals to temporarily block. */
	sigemptyset (&mask);
	sigaddset (&mask, SIGUSR1);

	…

	/* Wait for a signal to arrive. */
	sigprocmask (SIG_BLOCK, &mask, &oldmask);
	while (!usr_interrupt)
		sigsuspend (&oldmask);
	sigprocmask (SIG_UNBLOCK, &mask, NULL);

"This last piece of code is a little tricky. The key point to
remember here is that when sigsuspend returns, it resets
the process’s signal mask to the original value, the value
from before the call to sigsuspend—in this case, the SIGUSR1
signal is once again blocked. The second call to sigprocmask
is necessary to explicitly unblock this signal."

I *think* the changes you made to tdb_robust_mutex_setup_sigchild()
are doing the same thing by doing the block/unblock inside
it depending on the 'bool' paramter, but to be honest I'd
much prefer it if you moved those calls outside and back
to just before/after the calls to tdb_robust_mutex_setup_sigchild(),
and leave that call as-is. That way I can *see* they're
there at the right point in the code.

> -- 
> SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
> phone: +49-551-370000-0, fax: +49-551-370000-9
> AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
> http://www.sernet.de,mailto:kontakt@sernet.de

> From 9f9ad6fff9a3000a984c9093b4f46d166c662d35 Mon Sep 17 00:00:00 2001
> From: Ralph Boehme <slow at samba.org>
> Date: Sat, 26 Mar 2016 12:43:55 +0100
> Subject: [PATCH] tdb: avoid a race condition when checking for robust mutexes
> 
> This fixes a race between calling waitpid() in two places (SIGCHLD the
> signal handler and the rendezvous code when waiting for the child to
> terminate), by
> 
> - blocking SIGCHLD before installing our signal handler
> 
> - in the rendezvous code call sigssuspend() which unblocks SIGCHLD and
>   suspends the thread and waits for signal delivery
> 
> BUG: https://bugzilla.samba.org/show_bug.cgi?id=11808
> 
> Signed-off-by: Ralph Boehme <slow at samba.org>
> ---
>  lib/tdb/common/mutex.c | 53 ++++++++++++++++++++++++++------------------------
>  1 file changed, 28 insertions(+), 25 deletions(-)
> 
> diff --git a/lib/tdb/common/mutex.c b/lib/tdb/common/mutex.c
> index fae43d4..22107d4 100644
> --- a/lib/tdb/common/mutex.c
> +++ b/lib/tdb/common/mutex.c
> @@ -714,11 +714,23 @@ static void (*tdb_robust_mutext_old_handler)(int) = SIG_ERR;
>  static pid_t tdb_robust_mutex_pid = -1;
>  
>  static bool tdb_robust_mutex_setup_sigchild(void (*handler)(int),
> -			void (**p_old_handler)(int))
> +					    void (**p_old_handler)(int),
> +					    bool block_sigchld,
> +					    sigset_t *old_mask)
>  {
>  #ifdef HAVE_SIGACTION
> +	int ret;
>  	struct sigaction act;
>  	struct sigaction oldact;
> +	sigset_t mask;
> +	int action = block_sigchld ? SIG_BLOCK : SIG_UNBLOCK;
> +
> +	sigemptyset(&mask);
> +	sigaddset(&mask, SIGCHLD);
> +	ret = pthread_sigmask(action, &mask, old_mask);
> +	if (ret != 0) {
> +		return false;
> +	}
>  
>  	memset(&act, '\0', sizeof(act));
>  
> @@ -777,6 +789,7 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
>  	bool ok;
>  	int status;
>  	static bool initialized;
> +	sigset_t old_mask;
>  
>  	if (initialized) {
>  		return tdb_mutex_locking_cached;
> @@ -828,8 +841,10 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
>  		goto cleanup_ma;
>  	}
>  
> -	if (tdb_robust_mutex_setup_sigchild(tdb_robust_mutex_handler,
> -			&tdb_robust_mutext_old_handler) == false) {
> +	ok = tdb_robust_mutex_setup_sigchild(tdb_robust_mutex_handler,
> +					     &tdb_robust_mutext_old_handler,
> +					     true, &old_mask);
> +	if (!ok) {
>  		goto cleanup_ma;
>  	}
>  
> @@ -883,20 +898,15 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
>  		goto cleanup_child;
>  	}
>  
> +	sigdelset(&old_mask, SIGCHLD);
>  	while (tdb_robust_mutex_pid > 0) {
> -		pid_t pid;
> -
> -		errno = 0;
> -		pid = waitpid(tdb_robust_mutex_pid, &status, 0);
> -		if (pid == tdb_robust_mutex_pid) {
> -			tdb_robust_mutex_pid = -1;
> -			break;
> -		}
> -		if (pid == -1 && errno != EINTR) {
> +		ret = sigsuspend(&old_mask);
> +		if (ret != -1 || errno != EINTR) {
>  			goto cleanup_child;
>  		}
>  	}
> -	tdb_robust_mutex_setup_sigchild(tdb_robust_mutext_old_handler, NULL);
> +	tdb_robust_mutex_setup_sigchild(tdb_robust_mutext_old_handler,
> +					NULL, false, NULL);
>  
>  	ret = pthread_mutex_trylock(m);
>  	if (ret != EOWNERDEAD) {
> @@ -927,22 +937,15 @@ _PUBLIC_ bool tdb_runtime_check_for_robust_mutexes(void)
>  
>  cleanup_child:
>  	while (tdb_robust_mutex_pid > 0) {
> -		pid_t pid;
> -
>  		kill(tdb_robust_mutex_pid, SIGKILL);
> -
> -		errno = 0;
> -		pid = waitpid(tdb_robust_mutex_pid, &status, 0);
> -		if (pid == tdb_robust_mutex_pid) {
> -			tdb_robust_mutex_pid = -1;
> -			break;
> -		}
> -		if (pid == -1 && errno != EINTR) {
> -			break;
> +		ret = sigsuspend(&old_mask);
> +		if (ret != -1 || errno != EINTR) {
> +			goto cleanup_child;
>  		}
>  	}
>  cleanup_sig_child:
> -	tdb_robust_mutex_setup_sigchild(tdb_robust_mutext_old_handler, NULL);
> +	tdb_robust_mutex_setup_sigchild(tdb_robust_mutext_old_handler,
> +					NULL, false, NULL);
>  cleanup_m:
>  	pthread_mutex_destroy(m);
>  cleanup_ma:
> -- 
> 2.5.0
> 




More information about the samba-technical mailing list