[PATCH] Don't set child->requests to NULL in parent process after fork

Jeremy Allison jra at samba.org
Sun Jan 11 02:19:19 GMT 2009


On Sat, Jan 10, 2009 at 02:02:25PM +0800, boyang wrote:
> Hi, everyone:
>      I think it is a logic error to set child->requests to NULL in
> parent process. Because requests are added to requests list before
> fork() and schedule_async_request(). If it is set to NULL, only the
> first request gets scheduled, others are lost. Why does the parent shoot
> itself like this? :-)
>      Please review the patch, Thanks!

Ok, whilst shopping @ Costco I've been doing a lot of
thinking about why we haven't been bitten harder by
this bug over the time we've had it :-). It's a pretty
bad bug.

I think it's because this would only happen after
we had a long-running winbind child, with requests
queued for it, which then died. The first request
would cause the winbindd child to be created, and
then get NULL'ed out, but this wouldn't cause a
crash as DLIST_REMOVE doesn't check that the value
being removed is actually on the list, so all that
would happen is the request queue would be one shorter
than it should have been, and the first request would
have been handled by the child and then disappear
(not noticing it had already been NULL'ed out from
the request queue). All subsequent requests would
have been handled correctly until the winbindd
child died, and was re-created.

The bug would have bitten us when there was a
slow winbindd child (so lots of requests queued
in the parent) and the winbindd child died, and
then the queued requests other than the one in
process would get dropped. This would explain
client processes not getting a response. But
the question would be why did the winbindd child
die and what state is the connection to the
domain in at that point ? So it comes down to
the robustness of the recovery logic, which
depends on the reason the child died.

My guess is we didn't get bitten by this
often, but when we did it bit us *hard*.

Anyway boyang, thanks a *LOT* for finding this
one - please keep looking over the winbindd
logic and challenging the assumptions there,
you're getting really good at this :-) :-).

Cheers,

	Jeremy.


More information about the samba-technical mailing list