Race condition in tdb_runtime_check_for_robust_mutexes()

Uri Simchoni uri at samba.org
Wed Mar 23 05:36:53 UTC 2016


OK I've figured out why we want the waitpid() in the signal handler - we 
want to catch the child terminating yet still support SIGCHLD handling 
of the enclosing process.

Hopefully I'll submit a patch shortly.

Thanks,
Uri.

On 03/22/2016 10:06 PM, Uri Simchoni wrote:
> I saw smbd hang on startup, and got lucky to get a decent stack trace
> that shows it hangs in tdb_runtime_check_for_robust_mutexes(), line 890,
> and the child process he's waiting for is gone.
> After some head scratching I think I may have figured it out:
>
>   886         while (tdb_robust_mutex_pid > 0) {
>   887                 pid_t pid;
>   888
>   889                 errno = 0; /** BAM! SIGCHLD!!! exit status
> collected and tdb_robust_mutex_pid becomes -1 */
>   890                 pid = waitpid(tdb_robust_mutex_pid, &status, 0);
> /* wait for ANY child process to finish - hang */
>   891                 if (pid == tdb_robust_mutex_pid) {
>   892                         tdb_robust_mutex_pid = -1;
>   893                         break;
>   894                 }
>   895                 if (pid == -1 && errno != EINTR) {
>   896                         goto cleanup_child;
>   897                 }
>   898         }
>
> And the question, assuming this is correct, is why do we have to
> waitpid() in the signal handler (I understand we need the signal handler
> to cope with SIG_IGN since this is a generic library and we don't know
> what's the signal arrangement).
>
> Also it seems like tdb_robust_mutex_setup_sigchild() doesn't necessarily
> restore the SIGCHLD exactly to the way it was.
>
> Comments?
>
> Thanks,
> Uri.
>




More information about the samba-technical mailing list