winbindd timeouts against Samba 3.0.3 PDC
John P Janosik
jpjanosi at us.ibm.com
Thu Apr 29 19:59:44 GMT 2004
I have run into some performance problems trying to migrate an NT 4 domain
to a Samba 3.0.3 domain with the ldap backend. I opened this in
bugzilla(#1237) a few weeks ago. I have some time to look at it again and
have a few questions on the right approach to work around it.
Info on our Setup:
NT 4.0 domain with ~8700 user/machine accounts and ~800 groups.
Domain member servers are RedHat Linux 9 with Samba 2.2.8a built from
source from samba.org. They run winbindd for mapping domain users/groups
Our test Samba DC is a dual 2GHz Xeon running RedHat EL 3, openldap 2.1.29
with bdb backend, and Samba 3.0.3 built from source from samba.org. I
think I have all ldap indexes configured correctly and have the bdb cache
large enough to hold the entire ldap database in memory.
If I increase the number of users/groups in the domain much winbindd times
out making a query_dom_info level 2 RPC against the Samba DC. The problem
does not occur against a NT 4 DC.
What I have found so far:
Winbindd is making the query_dom_info level 2 call to obtain the sequence
number for the SAM, but it also returns the user and group count for the
SAM. Samba is calculating the user count by iterating through the SAM with
pdb_getsampwent. On my original test server, a dual 500MHz Pentium 2, the
ldap search took ~10 seconds and the iteration through each SAM entry took
~12 seconds. It looks like winbindd times out after 20 seconds. The Samba
DC machine is CPU bound the entire time.
With the tdb backend things are much faster. The ldap search is eliminated
and the code in pdb_tdb.c that loads each SAM entry is more efficient. It
took only ~1 second to iterate through all the same SAM entries in a tdb.
I originally worked around the problem on my slower test server by creating
two new functions in pdb_ldap.c , pdb_getusercount and pdb_getgroupcount.
For the ldap backend this only does the ldap search so the query_dom_info
level 2 RPC would return after ~10 seconds. I now see that this won't fix
the problem for larger domains because the ldap search time seems to grow
at a faster rate than the time Samba takes to iterate through the SAM. On
my new test DC, a dual 2GHz Xeon, the query_dom_info RPC returns in ~6
seconds with the time split about equally between the ldap search and the
iteration through the SAM. When I increase the number of users to ~18000
the query_dom_info RPC takes ~46 seconds on my faster test machine. The
time smbd took to iterate the SAM went up to ~6 seconds while the time the
ldap search took went up to ~40 seconds. I have worked on improving my
ldap user/group count functions by setting the list of attrs to retrieve to
just the uid instead of all the Samba schema and not returning the values
but this only decreases the ldap search time by ~10 seconds with ~18000
Questions for the list:
1. Since Samba is just returning the current time for a SAM sequence
number should I just add an option in winbindd to disable the sequence
number check and always force expired entries in the winbindd cache to be
2. Is there any other RPC that returns the SAM sequence number that is
less expensive than this one? I haven't been able to find one.
3. Is Windows keeping a tally of the user/group count? Our Windows NT 4
DCs are on old hardware but they handle the query_dom_info level 2 RPC
quickly. I suppose the Samba DC can't do this since it would get out of
sync if people directly modify the ldap backend.
More information about the samba-technical