Reducing LDAP delays with unreachable DCs
jameadows at webopolis.com
Wed Jun 23 16:43:05 GMT 2004
Here's the patch I made to our 3.0RC1 build for these changes. So far our
testing has been successful, one of our QA engineers pounded on it pretty
hard for a while, but the formal QA process is still underway.
One thing that might be worth cleaning up is that I simply copied the
open_ldap_with_timeout() function and related pieces from winbindd_rpc.c
because they were static in that module. It might ultimately be cleaner to
make the version in winbindd_rpc global instead of duplicating it in ldap.c.
I like your idea about caching the connected DC. Are you seeing that samba
connects to different DC's somewhat randomly while it is joined to a domain?
I notice that there is the function get_sorted_dc_list() and I am wondering
if this function is used to achieve a similar net effect, i.e. samba tries
the first DC in the list, then the second and so on.
Anyhow, I hope you find this patch helpful. I haven't sent the patched
source files because our build is based on a different version and we are
also applying a number of other patches that may not be relevant to your
build. To quickly recap what this patch does:
1) Incorporates open_ldap_with_timeout into ldap.c and calls it from
2) adds 'ldap timeout' parameter to smb.conf. The default timeout if
unspecified is five seconds.
3) Removes negatively cached DC's from the list returned by get_dc_list().
Of course, I'd be grateful to hear if you spot any errors, potential hitches
Senior software engineer
> -----Original Message-----
> From: Klinger, John (N-CSC) [mailto:john.klinger at lmco.com]
> Sent: Wednesday, June 23, 2004 8:30 AM
> To: samba-technical at lists.samba.org
> Subject: Reducing LDAP delays with unreachable DCs
> I'm having the exact same problem. I like your suggestions, and would
> be interested in seeing your patches, if you can make them available.
> I also have been considering one other enhancment. The previously
> available DC can be cached. Then when a list of DCs is given, a
> connection can be attempted to this cached DC first if it is still in
> the DC list. This scheme would only hit the "connect" Timeout when
> a DC first fails. Subsequent "connect"s will go to the surviving DC.
> The one thing I dislike about this proposal is that if the smb.conf
> specifies an explicit "password server" list, the DCs should be checked
> in the order specified. If the first DC fails, this mod will prevent Samba
> from ever going back to that DC unless Samba is restarted or the
> second DC goes down.
> John Klinger
> > Hello,
> > The company that I work for uses samba in an enterprise environment. We
> > have encountered situations where winbindd has, in its DC list,
> one or more
> > DCs that are unreachable which really bogs down the server.
> I've made some
> > tweaks that seem to have helped quite a bit, at least in my
> contrived test
> > scenario which I will describe below. I'm still evaluating the
> > effectiveness and robustness of these changes and I thought I'd
> send this to
> > the list in case anyone has any insights into whether these
> changes are a
> > good thing and what the potential side-effects might be.
> > SAMBA VERSION: 3.0rc1
> > PROBLEM ENVIRONMENT: winbindd gets the ip address of one or
> more DCs that
> > cannot be reached. This might be because of routing problems, incorrect
> > hosts/lmhosts settings or bad DNS entries.
> > TEST SITUATION: I have found that the easiest way to simulate
> this problem
> > is by adding bogus IP entries into 'smb.conf:password server='.
> For example
> > "password server = 192.168.100.1, *" with the first address being
> > non-existent on the network. I have a valid Win2k DC on this network as
> > well and am able to join its domain without any problems under normal
> > circumstances.
> > With the added bogus entry the main problem I found was in the function
> > ads_try_connect() where the call to open_ldap() takes three
> minutes to time
> > out. Setting the LDAP timeout option doesn't help, this seems
> to limit the
> > search time but has no effect on the connect timeout. This
> test setup is
> > faked but it seems to simulate the behavior that I have seen
> when winbindd
> > has unreachable DCs in its DC list.
> > SOLUTIONS:
> > I found a function called open_ldap_with_timeout() in
> winbindd_rpc.c. This
> > is a static function so I pasted a copy of it into libads/ldap.c and
> > replaced the call to ldap_open (in ads_try_connect()) with
> > open_ldap_with_timeout() (which uses an alarm to cancel the connect
> > request). I also added a parameter, ldap_timeout, to smb.conf
> to make it
> > easy to try different timeout values. With this option I can define the
> > ldap connection timeout to a certain number of seconds.
> > This change produced a marked improvement. Whereas before even
> one bad IP
> > address would have a severe impact now using four bad ip addresses makes
> > only a small impact on the initial time it takes to join the domain.
> > Looking at ethereal traces the behavior is exactly the same as
> before, only
> > it goes through the list of DCs much faster.
> > As an additional measure I also modified the function
> get_dc_list as it goes
> > through the list of DCs copying them to the user buffer.
> Before copying an
> > entry I added a call to check_negative_conn_cache() and if the
> DC is in the
> > failed connection cache it is not added to the DC list. These
> cache entries
> > go stale after 30 seconds so a DC should have a chance to
> 'redeem' itself if
> > it was only temporarily unavailable.
> > CONCLUSION:
> > Early results are encouraging, but awaiting some more authentic
> testing by
> > our QA dept. I just wanted to float this out there in case
> this is of any
> > use to anyone, or if anyone knows of a better solution or sees
> trouble with
> > this one.
> > Thanks all!
> > Joe Meadows
> > Snap Appliance
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4690 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20040623/481d3726/ldap_connect_timeout.obj
More information about the samba-technical