getent passwd timeouts on samba 3.5.1

Nagaraj Shyam Nagaraj_Shyam at symantec.com
Wed Aug 18 23:24:02 MDT 2010


>Yes, in 3.6.x this has been made asynchronous.

 

Great!! Will look at 3.6.x to understand the solution.

Thanks for the reply.

 

>However, the underlying problem is the nss interfaces on UNIX.

>They're broken. Imagine a dirctory server with millions of

>user or computer objects. The "enumerate" concept, using

>getpwent() to iterate through all available users, if

>fundamentally broken in this environment. There's a reason

>that modern Windows uses "search" methods, not "enumerate"

>methods, when looking up

 

Did the "enumerate" approach have to do with starting with a flat
newline delimited passwd file J, old ideas die hard. 

It is either a sequential search or enumerate for this case.  

 

>What underlying problem are you trying to solve ? Which

>application actually needs to enumerate all available users

>or groups ? What I'd recommend is look into the fundamental

>problem you're trying to solve by enumerating all users,

>and fix that.

 

There is no need to enumerate all users, however if a new user that the
server has never seen before comes in we should be able to getpwent()
for the particular user - and this does not work reliably all the time
either when idmap backend is ldap and we cold start from nothing in the
ldap idmap db, i.e. winbind has to generate the mapping and stick it
into ldap.

 

-s

 

 

From: Nagaraj Shyam 
Sent: Wednesday, August 18, 2010 3:10 PM
To: 'samba-technical at lists.samba.org'
Subject: getent passwd timeouts on samba 3.5.1

 

Hi All.

 

I am using samba server 3.5.1 on SUSE Linux Enterprise Server 10
(x86_64).  Id map backend is configured to be a ldap server.  The # of
users in the windows domain is upwards of 10000.  I almost always see
"getent passwd" timeout after listing the "passwd" entries for about a
thousand users after a clean start (nothing in tdb files, nothing in
ldap backend database).  Sometimes it lists none at all.  Repeat
commands of "getent passwd" progressively list 250 more users.  wbinfo
-i is flakey as well - it is a hit or miss if it can list the user
information.

 

1.       One of the problem areas of code is in libnss_winbind.so 

 

samba-3.5.1/nsswitch/wb_common.c - 

         /* Wait for 5 seconds for a reply. May need to parameterise
this ... */

                tv.tv_sec = 5;

 

                if ((selret = select(winbindd_fd + 1, &r_fds, NULL,
NULL, &tv))== -1) {

                        winbind_close_sock();

                        return -1;                   /* Select error */

                }

 

                if (selret == 0) {

                        /* Not ready for read yet... */

                        if (total_time >= 30) {

                                /* Timeout */

                                winbind_close_sock();

 

 

The above timeout (of 30 seconds) is fairly frequently on my test setup.
Increasing the loop timeout from 5 sec to 30 sec and total timeout to
180 seconds, works always in my setup - this is a kludge really.

 

2.       I see the following issues at winbindd side:

The parent winbindd daemon creates a WINBINDD_GETPWNAM request that has
not even been picked up for servicing by the idmap winbindd daemon while
the 5 sec timeout has been hit in the wb_common.c inner loop above.

 

 

 

To make the solution better that works in all configurations (instead of
the kludge of increasing the timeout arbitrarily), we need a better
protocol b/w the client of winbindd service as well as between the
parent winbindd and the child daemons on the lines along the following:

 

 

-          The child winbindd daemon (idmap in this case),  needs to
update progress periodically, say any request completion indicates
progress.

-          The parent winbindd daemon needs to indicate to the client
(wbinfo, getent etc.) that progress is being made on the request and ask
for more time to service it (pending status).

-          Client should be enhanced to handle pending status to wait
for more time (there can be a hard limit which is much larger than the
current hardwired values).

 

Is the above area being looked at currently or is there a plan to
enhance the above in the future?

 

Thanks for any information/pointers to open bug ids. 

 

Regards.

-s



More information about the samba-technical mailing list