Race condition in tdb_runtime_check_for_robust_mutexes()
Uri Simchoni
uri at samba.org
Wed Mar 23 05:36:53 UTC 2016
OK I've figured out why we want the waitpid() in the signal handler - we
want to catch the child terminating yet still support SIGCHLD handling
of the enclosing process.
Hopefully I'll submit a patch shortly.
Thanks,
Uri.
On 03/22/2016 10:06 PM, Uri Simchoni wrote:
> I saw smbd hang on startup, and got lucky to get a decent stack trace
> that shows it hangs in tdb_runtime_check_for_robust_mutexes(), line 890,
> and the child process he's waiting for is gone.
> After some head scratching I think I may have figured it out:
>
> 886 while (tdb_robust_mutex_pid > 0) {
> 887 pid_t pid;
> 888
> 889 errno = 0; /** BAM! SIGCHLD!!! exit status
> collected and tdb_robust_mutex_pid becomes -1 */
> 890 pid = waitpid(tdb_robust_mutex_pid, &status, 0);
> /* wait for ANY child process to finish - hang */
> 891 if (pid == tdb_robust_mutex_pid) {
> 892 tdb_robust_mutex_pid = -1;
> 893 break;
> 894 }
> 895 if (pid == -1 && errno != EINTR) {
> 896 goto cleanup_child;
> 897 }
> 898 }
>
> And the question, assuming this is correct, is why do we have to
> waitpid() in the signal handler (I understand we need the signal handler
> to cope with SIG_IGN since this is a generic library and we don't know
> what's the signal arrangement).
>
> Also it seems like tdb_robust_mutex_setup_sigchild() doesn't necessarily
> restore the SIGCHLD exactly to the way it was.
>
> Comments?
>
> Thanks,
> Uri.
>
More information about the samba-technical
mailing list