[PATCH] Fix pthreadpool fork behaviour

Thu Aug 31 06:26:55 UTC 2017

On Wed, Aug 30, 2017 at 12:59:04PM -0700, Jeremy Allison wrote:
> OK, just to make sure I understand it before giving RB+.
> 
> Sorry if this is just a text description of what your
> new code does (as it is just that :-), but I find that
> writing these things down in text helps me understand the logic
> on these fixes.
> 
> If there are idle threads in any threadpool when a process
> wants to fork, they're blocked on pool->condvar waiting to
> be assigned more work. After the fork if the child exits
> it calls pthread_cond_destroy(pool->condvar) whilst an
> idle thread is waiting on it, which causes the process to hang.

It's not on exit, but on reinit_after_fork, but you're correct in that
it's hanging on the internal lock in the condvar. The other point is
that the convar's associated mutex is internally marked as busy as
long as a thread waits for the condvar. Thus pthread_mutex_destroy
returns EBUSY. We'll never be able to release the mutex, because the
thread waiting in the condvar has not made it into the child.

David Butenhof sums up the situation quite nicely in

https://groups.google.com/forum/m/#!topic/comp.programming.threads/ThHE32-vRsg

> The real answer is that pthread_atfork() is a completely useless and
> stupid mechanism that was a well intentioned but ultimately
> pointless attempt to carve a "back door" solution out of an
> inherently insoluable design conflict. 

Thanks Ralph for the reference. David made me feel a bit better
putting in this bug in the first place :-)

> So in the prepare_fork path pthreadpool_prepare_pool()
> you lock the pool mutex:
> 
> pthread_mutex_lock(&pool->mutex);
> 
> then set up a local pthread_cond_t prefork_cond and
> point to it from pool->prefork_cond, which allows
> the exiting idle thread to notify the waiting main
> thread that it's exited (this was the part that took
> the longest to understand). Then wake up the idle
> thread via:
> 
> pthread_cond_signal(&pool->condvar);
> 
> and wait for it to tell you it is done:
> 
> pthread_cond_wait(&prefork_cond, &pool->mutex);
> 
> (we return from this with pool->mutex *LOCKED*)
> 
> and then NULL out pool->prefork_cond in
> preparation for exiting the while (pool->num_idle != 0)
> loop. You keep doing this whilst there are idle threads
> to notify.
> 
> pthreadpool_prepare_pool() then exits with
> pool->mutex LOCKED, doing the job pthreadpool_prepare()
> used to do.
> 
> Inside the idle thread, when awoken to look for a
> new job inside pthreadpool_server() (with pool->mutex LOCKED)
> you check if pool->prefork_cond exists, and if so
> signal the awaiting main thread we're exiting via:
> 
> pthread_cond_signal(pool->prefork_cond);
> 
> and then call pthreadpool_server_exit() to unlock the
> pool->mutex.
> 
> The race with pthreadpool_server_exit() freeing
> all the mutexes/condvars in the pool by calling
> pthreadpool_free() is avoided as pool->shutdown
> needs to be true (someone must have called
> pthreadpool_destroy()) for that to happen,
> plus the pool->prefork_cond path can't be
> taken in pthreadpool_server() as that also
> depends on !pool->shutdown. Phew...
> 
> You're also destroying and re-initializing
> the pool->condvar before/after the fork,
> but that part is pretty clear.
> 
> Phew. I think I get it - in which case RB+ me,
> with the only change being the comment:

Yep. That's what it is.

> 
> +       /*
> +        * Condition variable indicating that we should quickly go
> +        * away making away for fork() without anybody waiting on
> +        * pool->condvar.
> +        */
> 
> should be:
> 
> +       /*
> +        * Condition variable indicating that we should quickly go
> +        * away making way for fork() without anybody waiting on
> +        * pool->condvar.
> +        */
> 
> (note the change from away -> way in the second line of text).

Sure, thanks! New patch attached.

> Amazing work Volker, thanks !!!!!!

The truly amazing thing is how many bugs can still lurk in so little
code (613 lines including all comments). Same for the recent tdb
ENOSPC handling. At some point you would think a piece of code should
be done, but that never seems to be the case.

Volker

-- 
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de