Race condition in tdb_runtime_check_for_robust_mutexes()

Uri Simchoni uri at samba.org
Tue Mar 22 20:06:15 UTC 2016


I saw smbd hang on startup, and got lucky to get a decent stack trace 
that shows it hangs in tdb_runtime_check_for_robust_mutexes(), line 890, 
and the child process he's waiting for is gone.
After some head scratching I think I may have figured it out:

  886         while (tdb_robust_mutex_pid > 0) {
  887                 pid_t pid;
  888
  889                 errno = 0; /** BAM! SIGCHLD!!! exit status 
collected and tdb_robust_mutex_pid becomes -1 */
  890                 pid = waitpid(tdb_robust_mutex_pid, &status, 0); 
/* wait for ANY child process to finish - hang */
  891                 if (pid == tdb_robust_mutex_pid) {
  892                         tdb_robust_mutex_pid = -1;
  893                         break;
  894                 }
  895                 if (pid == -1 && errno != EINTR) {
  896                         goto cleanup_child;
  897                 }
  898         }

And the question, assuming this is correct, is why do we have to 
waitpid() in the signal handler (I understand we need the signal handler 
to cope with SIG_IGN since this is a generic library and we don't know 
what's the signal arrangement).

Also it seems like tdb_robust_mutex_setup_sigchild() doesn't necessarily 
restore the SIGCHLD exactly to the way it was.

Comments?

Thanks,
Uri.



More information about the samba-technical mailing list