Winbindd limited by select

Thu Feb 13 08:51:03 GMT 2003

Ken Cross wrote:
> There is pretty much a one-to-one correspondence between the number of
> smbd processes open (i.e. connected users) and winbindd file descriptors
> (per fstat).

Hmm, it may be platform specific. smbd connects winbindd both directly
and via NSS. On HP-UX it consumes two client pipes per smbd, and this
might be due to linking libnss_winbind.1 with "-B symbolic", having
symbols resolved locally, such that the two ways used by every smbd
don't share client environment? It's just a guess.

> I think forking would be counter-productive since winbindd caches so
> much stuff.  A lot of it's in tdb files, though, so...

Agreed. And as far as 2.2 is concerned, a complete restructuring
doesn't make sense any more. One can improve robustness and scalability
with less modifications.

> 
> Frankly, I'm not sure what the smbd's are doing with winbindd after
> authentication.

I have observed that they are looking up uid->sid and gid->sid mappings
very frequently. Yes, even if the Windows client seems really idle
(user left desk and does not read or write anything) such lookups are
triggered at least once per minute for every client.

The frequency increases when a user is working actively. Plus lookups
of name->sid, "user in group", and so on, but these are far less frequent.

The latter ones can't be cached easily, but the set of id mappings
an smbd comes across can.

> There's another discussion looming about closing idle
> connections to winbindd.

Is this a separate discussion? I would consider them related :)

Andrew Esh wrote:
> 
> Better yet: Have winbindd fork the same way smbd does, on a per-client
> basis.
> 
> Someone should probably figure out what quality of the example network
> caused winbindd to consume so many sockets. Are there really that many
> requests being queued up at once? Shifting to a forking model would
> simply consume the same number and more processes. They are limited too.

Winbind's client connections persist as long as the client processes
are alive, and are waiting for further requests. Clients never explicitly
close the pipe, even if they had only one lookup during their lifetime.

This is fine, as it makes clients submitting multiple requests in a row
more efficient. And it gives winbindd the chance to store get??ent
states in the winbindd connection environment.

But if connection consumption gets excessive, why not look around
for idle clients, and shut them down? (Note, "idle" does not mean
the client is idle, just that it doesn't presently send requests to
winbindd.)
> 
> We also need to be sure all the requests are making progress. If one
> gets hung, the client program would probably repeat the request,
> expending another instance of everything. Are there really 2048 users
> actively trying to make winbindd requests at the same time?

This is very unlikely, even more with smbds caching id mappings.
And client requests are always processed, even if the client
connection has been shut down. Client gets a broken pipe error
on send and retries, opening a new connection. Shutting down another
one that has been idle for a long time, if we are still at threshold.

It is the same as when you restart winbindd while clients are alive.
They just reconnect as soon as they are having a new request. Very
gracefully :)

> 
> Perhaps this is the result of a very network-common failed NIS request,
> which falls through the passwd list in /etc/nsswitch.conf, and winds up
> asking winbindd about the same non-existent user. What is the content of
> the requests, and is there some way to fix the system so the users don't
> cause them to be issued at such a high rate? Should they even be
> forwarded to winbindd at all?

In such situations you will most probably see only one client connection
carrying many requests (if there is one process failing on many users),
or client connections popping up and going away rapidly (if there are
many processes failing on one user each).

Neither of them is a big problem for file descriptor consumption.

> 
> Maybe winbindd is piling up requests as it searches for a domain
> controller at the head of its "password server" list which is no longer
> working, or is no longer in DNS. Reorder that list, and winbindd might
> begin to process requests fast enough to stay ahead of the influx rate.

No, winbindd is working happily and rapidly (well, most of the time,
and if it isn't permanently kept busy with id mapping lookups :).

It's the unused socket file descriptors which pile up. They do
not leak, but are presently unused.

Cheers!
Michael