[Samba] Samba-LDAP with 100%CPU with connections in CLOSE_WAIT

Andrew Bartlett abartlet at samba.org
Mon Sep 19 21:31:10 UTC 2022


On Tue, 2022-09-20 at 08:55 +1200, Andrew Bartlett via samba wrote:
> On Mon, 2022-09-19 at 10:24 -0700, Jeremy Allison via samba wrote:
> > On Mon, Sep 19, 2022 at 05:20:04PM +0200, Steffen via samba wrote:
> > > Hi,
> > > 
> > > since some time we are facing a small problem:
> > > 
> > > 
> > > We are using samba (4.15.9-15) as AD-DC. As clients we have some
> > > NetAPP-FAS running which doing the auth. via LDAP. On NetApp
> > > timeouts for LDAP are set to 3sec per default.
> > > 
> > > Some queries seem to need more time to answer so the client tries
> > > to close the connection but the (samba-)server-part leaves the
> > > socket open in CLOSE_WAIT.
> > > 
> > > In some of such cases the corresponding process (ldap-worker)
> > > runs
> > > forever(?) with 100% cpu. A strace shows the ldap-worker pushing
> > > some info (the answer?)  to the socket. If one let it go the
> > > server
> > > slows down gradually while more and more connections stay in
> > > CLOSE_WAIT.
> > 
> > Can you post an strace, followed by a stack backtrace
> > from gdb of an ldap-worker process in such a state.
> > 
> > That would help debug - thanks !
> 
> The other helpful thing can be a 'flame graph' per
> 
> https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#Instructions
> 
> 
> These are handy as they give us a lot of great info but never any
> confidential info (as it is just function stacks and times in them). 
> 
> Clearly running a 6 second query from a client set to retry
> infinatley
> after 3 seconds will not go well, and I suspect Samba is working hard
> to answer those queries before dealing with the closed sockets (it
> may
> of course be possible to move those up the priority). 

It looks like the close of the socket will happen when we get to that
socket in the list looking to read from it again (and the read fails), so if there are a lot of slow queries outstanding, that could take a while. 

> Finally, if you set 'log level = 5' you can see what time each
> request
> takes, and what it is.  Setting the query timeout just as per Windows
> AD will also work (roughly) and provide notice (level 3 at 1/4 the
> timeout) and warnings at log level 1 after the timeout.  
> 
> See 
> https://bugzilla.samba.org/show_bug.cgi?id=14694
>  and 
> https://www.oreilly.com/library/view/active-directory-cookbook/0596004648/ch04s24.html
>  for a
> description of the limits. 
> 
> Andrew Bartlett
> 
> 
> -- 
> Andrew Bartlett (he/him)       
> https://samba.org/~abartlet/
> 
> Samba Team Member (since 2001) 
> https://samba.org
> 
> Samba Team Lead, Catalyst IT   
> https://catalyst.net.nz/services/samba
> 
> 
> Samba Development and Support, Catalyst IT - Expert Open Source
> Solutions
> 
> 
-- 
Andrew Bartlett (he/him)       https://samba.org/~abartlet/
Samba Team Member (since 2001) https://samba.org
Samba Team Lead, Catalyst IT   https://catalyst.net.nz/services/samba

Samba Development and Support, Catalyst IT - Expert Open Source
Solutions




More information about the samba mailing list