On winbind shutdown prior to the removal of gencache_stabilize we could crash due to races

Richard Sharpe realrichardsharpe at gmail.com
Mon Mar 11 19:30:47 UTC 2019


On Mon, Mar 11, 2019 at 11:31 AM Jeremy Allison <jra at samba.org> wrote:
>
> On Mon, Mar 11, 2019 at 11:20:49AM -0700, Richard Sharpe wrote:
> > On Mon, Mar 11, 2019 at 10:11 AM Jeremy Allison <jra at samba.org> wrote:
> > >
> > > On Mon, Mar 11, 2019 at 09:47:16AM -0700, Richard Sharpe via samba-technical wrote:
> > > > Hi folks,
> > > >
> > > > We are seeing this on winbind shutdown:
> > > >
> > > > --------------------------------------------------
> > > > 2019-01-11 01:16:19 winbindd[17540]: [2019/01/11 01:16:19.153272,  0]
> > > > ../source3/winbindd/winbindd.c:281(winbindd_sig_term_handler)
> > > > 2019-01-11 01:16:19 systemd[1]: Starting Hammerspace Maintenance Target.
> > > > 2019-01-11 01:16:19 winbindd[17540]:   Got sig[15] terminate (is_parent=0)
> > > > 2019-01-11 01:16:19 winbindd[17497]: [2019/01/11 01:16:19.153546,  0]
> > > > ../source3/winbindd/winbindd.c:281(winbindd_sig_term_handler)
> > > > 2019-01-11 01:16:19 winbindd[17497]:   Got sig[15] terminate (is_parent=1)
> > > > 2019-01-11 01:16:19 winbindd[17507]: [2019/01/11 01:16:19.153413,  0]
> > > > ../source3/winbindd/winbindd.c:281(winbindd_sig_term_handler)
> > > > 2019-01-11 01:16:19 winbindd[17507]:   Got sig[15] terminate (is_parent=0)
> > > > 2019-01-11 01:16:19 systemd[1]: Stopped System Security Services Daemon.
> > > > 2019-01-11 01:16:19 winbindd[17540]: [2019/01/11 01:16:19.162163,  0]
> > > > ../lib/util/fault.c:78(fault_report)
> > > > 2019-01-11 01:16:19 winbindd[17540]:
> > > > ===============================================================
> > > > 2019-01-11 01:16:19 winbindd[17540]: [2019/01/11 01:16:19.162202,  0]
> > > > ../lib/util/fault.c:79(fault_report)
> > > > 2019-01-11 01:16:19 winbindd[17540]:   INTERNAL ERROR: Signal 7 in pid
> > > > 17540 (4.7.1-GIT-c0bd705-Hammerspace)
> > > > 2019-01-11 01:16:19 winbindd[17540]:   Please read the
> > > > Trouble-Shooting section of the Samba HOWTO
> > > > 2019-01-11 01:16:19 winbindd[17540]: [2019/01/11 01:16:19.162220,  0]
> > > > ../lib/util/fault.c:81(fault_report)
> > > > 2019-01-11 01:16:19 winbindd[17540]:
> > > > ===============================================================
> > > > 2019-01-11 01:16:19 winbindd[17540]: [2019/01/11 01:16:19.162232,  0]
> > > > ../source3/lib/util.c:804(smb_panic_s3)
> > > > 2019-01-11 01:16:19 winbindd[17540]:   PANIC (pid 17540): internal error
> > > > 2019-01-11 01:16:19 winbindd[17540]: [2019/01/11 01:16:19.162550,  0]
> > > > ../source3/lib/util.c:915(log_stack_trace)
> > > > 2019-01-11 01:16:19 winbindd[17540]:   BACKTRACE: 25 stack frames:
> > > > --------------------------------------------------------------
> > > >
> > > > This is with a 4.7.1ish version of Samba.
> > > >
> > > > It seems to be due to a race between the parent and child with both of
> > > > them calling gencache_stabilize and with the right phase of the moon,
> > > > one seems to have closed the tdb (and thus unmapped the mutexes
> > > > memory) while the other is iterating the mutexes.
> > > >
> > > > I see that the whole gencache_stabilize stuff was removed around December 2018.
> > > >
> > > > 1. Is it worth filing a bug in case the change needs back porting?
> > >
> > > Nope. 4.7. is out of maintanence (except for security), so even if you log a bug
> > > the patch you'd attach would be a courtesy, but not go into a release.
> >
> > The bug likely still exists in 4.8 and maybe 4.9 :-)
>
> OK, I was confused, sorry. So you mean the gencache_stabilize()
> stuff is inherently racy and still exists in supported releases ?
>
> If so, yeah logging a bug is the right thing to do.

OK, now I understand the bug fully. I was confused for a while because
I have been doing a lot of work with pthreads, but this is not a
pthreads situation, but a separate process issue.

This code is still in v4-9-stable:

static void terminate(bool is_parent)
{
        if (is_parent) {
                /* When parent goes away we should
                 * remove the socket file. Not so
                 * when children terminate.
                 */
                char *path = NULL;

                if (asprintf(&path, "%s/%s",
                        lp_winbindd_socket_directory(),
WINBINDD_SOCKET_NAME) > 0) {
                        unlink(path);
                        SAFE_FREE(path);
                }
        }

        idmap_close();

        gencache_stabilize();

        netlogon_creds_cli_close_global_db();

If the parent exists before the children have finished their
gencache_stabilize scans, they will crash, because the mmap'd region
goes away.

I will file a ticket.

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)(传说杜康是酒的发明者)



More information about the samba-technical mailing list