[PATCH] winbindd: control number of winbindd's client connections

Volker Lendecke Volker.Lendecke at SerNet.DE
Mon Jun 8 03:13:37 MDT 2015


On Wed, Jun 03, 2015 at 09:12:06AM +0300, Uri Simchoni wrote:
> This patch handles a case we've encountered in which winbindd opened
> client connections up to the process limit on open file descriptors.
> 
> It actually happened in the field with a samba 3.3.16-based NAS
> appliance serving 200-300 SMB clients. Other factors that caused this
> were:
> - winbindd is contacting the DC for each session-setup (Bug 11259)
> - serving the requests was slow because winbindd was reopening the
> ldap connection for each request (Bug 11267 - already fixed)
> - DNS misconfiguration on site made serving the requests even slower
> 
> However, the basic behavior is that the winbindd client limit is not a
> hard limit and I've been able to reproduce it with latest master using
> a specially-crafted program which opened multiple requests to
> winbindd.
> 
> This patchset is divided into two parts:
> - parts 1-4 modify winbindd to make the client limit a hard limit -
> stop accepting new connections when the limit is reached and resume
> accepting when possible.
> - part 5 modifies the client side, removing the policy to retry up to
> 10 times if winbindd doesn't answer within 30 seconds (after
> connection has been opened and request sent). This change prevent a
> vicious cycle of piling more and more requests on winbindd if it is
> already too busy. Instead the client timeout is increased to 300
> seconds (30 seconds x 10), relying on winbindd to respond earlier with
> a failure code according to "winbind request timeout".

Sorry to be a bit late in the game, I was too busy to really give the
issue some thought last week.

How about the following approach to solve the problem:

Every time we are about to ship some request to a winbind child we could
to check whether the corresponding winbind client is still around. I
know that this means that we need to carry state information down to
some deep levels. The information about which client is being served is
not readily available in wb_child_request_trigger, but I'm sure there's
ways to get it there.

We can even go one step further and keep listening for readability on
the client socket while a request is in progress, waiting for its turn
down to the child process. While a child RPC request is in progress,
we need to stop listening for the client disconnect, as this would badly
destroy the parent->child protocol sequence.

If that is done, isn't most of the problem already solved?  If we are
running into fd limits, just bump up the number of fds. Fds are relatively
cheap these days, in particular if with modern winbind you're using
epoll. 3.3.16 won't have that yet, but master should.  Also, there can
be at most one fd per system thread assuming normal use of libwbclient.

The 300 second change then makes perfect sense, as retrying every 30
seconds only makes smbd go back to the end of the queue.

Regarding the "did some real work" issue: Yes, I can see value in it. But
as those requests do no blocking, do we really kill those clients very
often? My gut feeling is that these connections are very quick, and
if we kill idle clients based on a LRU scheme, will we ever hit those
PRIV_SOCKET request clients?

I know those changes might be larger, but I'm a bit reluctant to make
winbind stop accepting connections.  winbind is really central, and
stopping to accept connections might even have security implications.

With best regards,

Volker Lendecke

-- 
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de


More information about the samba-technical mailing list