Winbindd limited by select
michael.steffens at hp.com
Wed Feb 12 14:10:33 GMT 2003
Ken Cross wrote:
> I've run into a problem with winbindd in both 2.2.x and 3.0 where it
> just locks up after a while on large, busy networks.
> We finally tracked down the problem to the fact that the C library
> "select" function is limited by default to 256 file descriptors in
> NetBSD (1024 in FreeBSD, 2048 in Linux). So once 256 (or whatever) smbd
> processes connected to winbindd, it broke pretty badly and was very hard
> to kill.
> This is set at compile-time, not run-time. This line:
> #define FD_SETSIZE 2048 /* Max # of winbindd connections */
> must occur before the first invocation of <sys/types>.
> This could be a build option, but it might be much simpler to hard-code
> it in local.h, which is what I did to fix it.
> Can somebody check the implications of this on Solaris, HPUX, etc.?
This will hardly do on HP-UX, because there is a kernel parameter
"maxfiles" controlling the per-process max number of filedescriptors.
It's 60 by default after installation, but is tunable (with reboot).
I would not recommend to set it too high, since it's also a fuse against
single user processes eating up all available file descriptors (controlled
We have hit the limit *very* quickly on our Winbind production box,
of course, and I have increased maxfiles to 300. Still quite low
when expecting a couple of hundret smbd to become winbind clients.
Each of them consuming two FDs.
The solution (and this should also work on other platforms) was to
have winbindd housekeep its client connections by shutting down
idle connections, and have clients reconnect when required:
The threshold was chosen to be 100 active connections, which keeps
winbindd well below 300 FDs. Below 140, actually, including network
sockets and open database and log files.
This only works out well if clients don't connect too frequently,
helped achieving this.
I'm tracking winbindd shutting down sockets for about a week now,
and have extended the DEBUG line in remove_idle_client() to also print
idle time of removal candidates.
With about 100 concurrent smbds (i.e. ~200 client pipes) it
almost always finds connections idle for more than an hour.
I would assume forcing these to reconnect should have no measurable
impact, and the solution should scale to a multitude of its
It can't be applied directly to 3.0, however. I'm assuming that identifying
idle connections is more complicated there, as both read and write buffers
can be empty while waiting for a request to complete. But it should
nevertheless be possible.
More information about the samba-technical