Reducing LDAP delays with unreachable DCs
Klinger, John (N-CSC)
john.klinger at lmco.com
Wed Jun 23 15:30:08 GMT 2004
I'm having the exact same problem. I like your suggestions, and would
be interested in seeing your patches, if you can make them available.
I also have been considering one other enhancment. The previously
available DC can be cached. Then when a list of DCs is given, a
connection can be attempted to this cached DC first if it is still in
the DC list. This scheme would only hit the "connect" Timeout when
a DC first fails. Subsequent "connect"s will go to the surviving DC.
The one thing I dislike about this proposal is that if the smb.conf
specifies an explicit "password server" list, the DCs should be checked
in the order specified. If the first DC fails, this mod will prevent Samba
from ever going back to that DC unless Samba is restarted or the
second DC goes down.
> The company that I work for uses samba in an enterprise environment. We
> have encountered situations where winbindd has, in its DC list, one or more
> DCs that are unreachable which really bogs down the server. I've made some
> tweaks that seem to have helped quite a bit, at least in my contrived test
> scenario which I will describe below. I'm still evaluating the
> effectiveness and robustness of these changes and I thought I'd send this to
> the list in case anyone has any insights into whether these changes are a
> good thing and what the potential side-effects might be.
> SAMBA VERSION: 3.0rc1
> PROBLEM ENVIRONMENT: winbindd gets the ip address of one or more DCs that
> cannot be reached. This might be because of routing problems, incorrect
> hosts/lmhosts settings or bad DNS entries.
> TEST SITUATION: I have found that the easiest way to simulate this problem
> is by adding bogus IP entries into 'smb.conf:password server='. For example
> "password server = 192.168.100.1, *" with the first address being
> non-existent on the network. I have a valid Win2k DC on this network as
> well and am able to join its domain without any problems under normal
> With the added bogus entry the main problem I found was in the function
> ads_try_connect() where the call to open_ldap() takes three minutes to time
> out. Setting the LDAP timeout option doesn't help, this seems to limit the
> search time but has no effect on the connect timeout. This test setup is
> faked but it seems to simulate the behavior that I have seen when winbindd
> has unreachable DCs in its DC list.
> I found a function called open_ldap_with_timeout() in winbindd_rpc.c. This
> is a static function so I pasted a copy of it into libads/ldap.c and
> replaced the call to ldap_open (in ads_try_connect()) with
> open_ldap_with_timeout() (which uses an alarm to cancel the connect
> request). I also added a parameter, ldap_timeout, to smb.conf to make it
> easy to try different timeout values. With this option I can define the
> ldap connection timeout to a certain number of seconds.
> This change produced a marked improvement. Whereas before even one bad IP
> address would have a severe impact now using four bad ip addresses makes
> only a small impact on the initial time it takes to join the domain.
> Looking at ethereal traces the behavior is exactly the same as before, only
> it goes through the list of DCs much faster.
> As an additional measure I also modified the function get_dc_list as it goes
> through the list of DCs copying them to the user buffer. Before copying an
> entry I added a call to check_negative_conn_cache() and if the DC is in the
> failed connection cache it is not added to the DC list. These cache entries
> go stale after 30 seconds so a DC should have a chance to 'redeem' itself if
> it was only temporarily unavailable.
> Early results are encouraging, but awaiting some more authentic testing by
> our QA dept. I just wanted to float this out there in case this is of any
> use to anyone, or if anyone knows of a better solution or sees trouble with
> this one.
> Thanks all!
> Joe Meadows
> Snap Appliance
More information about the samba-technical