[PATCH] tfork fix issues with waiter process termination
slow at samba.org
Sat Sep 16 19:02:24 UTC 2017
On Sat, Sep 16, 2017 at 06:58:45PM +0000, Andrew Bartlett wrote:
> On Sun, 2017-09-17 at 06:25 +1200, Andrew Bartlett via samba-technical
> > On Sat, 2017-09-16 at 07:52 -0700, Ralph Böhme via samba-technical
> > wrote:
> > > On Sat, Sep 16, 2017 at 10:13:37AM +0000, Gary Lockyer wrote:
> > > > > Your fix for the status fd is spot on, I'd like to change the event fd fix
> > > > > however. Depending on whether tfork_event_fd() has been called, ownership of the
> > > > > event fd should be transferred to the caller.
> > > >
> > > > That makes more sense as it will reduce the chances of file descriptor
> > > > leaks.
> > > > >
> > > > > Please take a look at attached patchset. The FIXUP commits are some minor
> > > > > fixes. The event fd changes are in the "lib/util: only close the event_fd in
> > > > > tfork if the caller didn't call tfork_event_fd()" commit.
> > > >
> > > > I'm happy with the revised patch set.
> > >
> > > great!
> > >
> > > We also need a backport for 4.7, don't we? metze already created a bug for this
> > > one, so I'm going to add the bug URL and push later on if I don't hear any
> > > objections.
> > >
> > > https://bugzilla.samba.org/show_bug.cgi?id=13037
> > A BUG URL is always a good thing, but I'm not sure we need the backport
> > critically.
> Regardless I've filled in a short summary of the issue into the bug,
> for clarity.
> > The reason this was never noticed is that in the current use, a sibling
> > (actually cousin) worker process still holding the FD from the parent
> > is quite rare. You would get it if you ran 'samba -M single' for smbd
> > and winbindd, but thankfully everything is marked close-on-exec, so no
> > duplicate is found then, and the monitor process closes all FDs using a
> > loop.
> The biggest risk would be resolve_name_dns_ex_send(), which does a
> fork(). I could imagine a race between the exit from a samba_kcc run
> and a slow blocking DNS lookup in -M single.
> By luck I don't think they are likely to happen in the same process by
> default (-M standard), and dns with eventually time out, but luck and
> lockups are not a good combination.
> > The new use case, to handle a forked-but-not-execed samba child in a
> > process model re-factor is what caught this case.
> > I'll let Gary tell more of the tale of how we found this, but the
> > whiteboard sessions were quite epic, tracing who had what read and
> > write FDs open!
> The biggest risk isn't the code as-is, but something else that is
> backported that uses fork(), or tfork().
> So, on further consideration, BUG and backport it. We never want to
> debug this on a customer site :-)
More information about the samba-technical