TDB lock contention during "startup" event caused winbind crash

Kenny Dinh kdinh at peaxy.net
Fri May 27 22:41:49 UTC 2016


Hi Martin,

This is a one time occurrence.  I was not able to reproduce it even with
ctdb 2.5.x and samba 4.1.x.  Please forgive my ignorance, but I haven't
been able to see the new change that fixed the Samba/CTDB deadlock in Samba
4.4

>From the log, what I saw was the contention for g_lock.tdb between winbind
and smb processes rather than CTDB.  I reviewed the code path in
v4-4-stable and the code path to start a transaction on tdb is still the
same.

db_ctdb_transaction_start -> ctdbd_control ==> CTDB: ctdb_control_db_attach

>From the git history, the change that looks like it could affect this code
path might be commit "81ac247c10c362b643c2b3ce57df4fe6ecae78dd".

Is this the time when the rework of that code path was being done?  Any
pointer to where to look for the rework code would be greatly appreciated.

Thanks,
Kenny

On Fri, May 27, 2016 at 2:24 PM, Martin Schwenke <martin at meltin.net> wrote:

> Hi Kenny,
>
> On Fri, 27 May 2016 12:59:09 -0700, Kenny Dinh <kdinh at peaxy.net> wrote:
>
> > I ran into a situation where contention for tdb lock caused "winbind" to
> > crash. Below is the scenario
> >
> > -- ctdb process                       smbd process           winbindd
> > process
> >
> > -- "service start winbind"------------------------------         winbindd
> >  started (pid 14461)
> > -- "service start smb" ------ smbd started ( pid 14602)
> > -------------------------------------- smbd acquires lock g_lock.tdb -
> hung
> > -- invoke hung script ------  smbd child pid (14602) is still hung
> > -- ----------------------------------- smbd still lock g_lock.tdb
> > -- CTDB restart all services
> > -- kill existing winbindd ------------------------------------------
> > winbindd (pid 14461 term)
> > -- service start winbind ----------------------------------------
> winbindd
> > started (pid 14733)
> > --
> >
> -----------------------------------------------------------------------try
> > to lock g_lock.tdb but failed
> > --
> >  ------------------------------------------------------------------------
> > PANIC
> > --  kill existing smbd -------- Kill pid 14602
> > --  service start smb ------- smbd started
> >
> > Attached are log file from ctdb, winbind, and smb, and winbind core
> > backtrace.
> >
> > My propose patch is to make sure all winbindd, smbd, and nmbd services
> are
> > terminated at the beginning of "startup" event.
>
> Once again, this looks like the Samba/CTDB deadlock that is fixed in
> CTDB in Samba 4.4.
>
> 00.ctdb really shouldn't know anything about Samba related processes.
> I guess there's some possibility that the use of
> update_config_from_tdb() in the 00.ctdb startup event triggers the bug.
> We dropped update_config_from_tdb() in CTDB in Samba 4.3.
>
> I think you will be much happier if you can try to test with 4.4.
> There are a lot of improvements and bug fixes...  :-)
>
> peace & happiness,
> martin
>


More information about the samba-technical mailing list