Reducing LDAP delays with unreachable DCs

Joe Meadows jameadows at webopolis.com
Wed Jun 23 16:43:05 GMT 2004


Hi John,

Here's the patch I made to our 3.0RC1 build for these changes.  So far our
testing has been successful, one of our QA engineers pounded on it pretty
hard for a while, but the formal QA process is still underway.

One thing that might be worth cleaning up is that I simply copied the
open_ldap_with_timeout() function and related pieces from winbindd_rpc.c
because they were static in that module.  It might ultimately be cleaner to
make the version in winbindd_rpc global instead of duplicating it in ldap.c.

I like your idea about caching the connected DC.  Are you seeing that samba
connects to different DC's somewhat randomly while it is joined to a domain?
I notice that there is the function get_sorted_dc_list() and I am wondering
if this function is used to achieve a similar net effect, i.e. samba tries
the first DC in the list, then the second and so on.

Anyhow, I hope you find this patch helpful.  I haven't sent the patched
source files because our build is based on a different version and we are
also applying a number of other patches that may not be relevant to your
build.  To quickly recap what this patch does:

1) Incorporates open_ldap_with_timeout into ldap.c and calls it from
ads_try_connect.
2) adds 'ldap timeout' parameter to smb.conf.  The default timeout if
unspecified is five seconds.
3) Removes negatively cached DC's from the list returned by get_dc_list().

Of course, I'd be grateful to hear if you spot any errors, potential hitches
or improvements.

Cheers,
Joe Meadows
Senior software engineer
Snap Appliance


> -----Original Message-----
> From: Klinger, John (N-CSC) [mailto:john.klinger at lmco.com]
> Sent: Wednesday, June 23, 2004 8:30 AM
> To: samba-technical at lists.samba.org
> Subject: Reducing LDAP delays with unreachable DCs
>
>
> Joe,
>
> I'm having the exact same problem. I like your suggestions, and would
> be interested in seeing your patches, if you can make them available.
>
> I also have been considering one other enhancment. The previously
> available DC can be cached. Then when a list of DCs is given, a
> connection can be attempted to this cached DC first if it is still in
> the DC list. This scheme would only hit the "connect" Timeout when
> a DC first fails. Subsequent "connect"s will go to the surviving DC.
>
> The one thing I dislike about this proposal is that if the smb.conf
> specifies an explicit "password server" list, the DCs should be checked
> in the order specified. If the first DC fails, this mod will prevent Samba
> from ever going back to that DC unless Samba is restarted or the
> second DC goes down.
>
> John Klinger
>
> > Hello,
> >
> > The company that I work for uses samba in an enterprise environment.  We
> > have encountered situations where winbindd has, in its DC list,
> one or more
> > DCs that are unreachable which really bogs down the server.
> I've made some
> > tweaks that seem to have helped quite a bit, at least in my
> contrived test
> > scenario which I will describe below.  I'm still evaluating the
> > effectiveness and robustness of these changes and I thought I'd
> send this to
> > the list in case anyone has any insights into whether these
> changes are a
> > good thing and what the potential side-effects might be.
> >
> > SAMBA VERSION: 3.0rc1
> >
> > PROBLEM ENVIRONMENT: winbindd gets the ip address of one or
> more DCs that
> > cannot be reached.  This might be because of routing problems, incorrect
> > hosts/lmhosts settings or bad DNS entries.
> >
> > TEST SITUATION: I have found that the easiest way to simulate
> this problem
> > is by adding bogus IP entries into 'smb.conf:password server='.
>  For example
> > "password server = 192.168.100.1, *" with the first address being
> > non-existent on the network.  I have a valid Win2k DC on this network as
> > well and am able to join its domain without any problems under normal
> > circumstances.
> >
> > With the added bogus entry the main problem I found was in the function
> > ads_try_connect() where the call to open_ldap() takes three
> minutes to time
> > out.  Setting the LDAP timeout option doesn't help, this seems
> to limit the
> > search time but has no effect on the connect timeout.  This
> test setup is
> > faked but it seems to simulate the behavior that I have seen
> when winbindd
> > has unreachable DCs in its DC list.
> >
> > SOLUTIONS:
> > I found a function called open_ldap_with_timeout() in
> winbindd_rpc.c.  This
> > is a static function so I pasted a copy of it into libads/ldap.c and
> > replaced the call to ldap_open (in ads_try_connect()) with
> > open_ldap_with_timeout() (which uses an alarm to cancel the connect
> > request).  I also added a parameter, ldap_timeout, to smb.conf
> to make it
> > easy to try different timeout values.  With this option I can define the
> > ldap connection timeout to a certain number of seconds.
> >
> > This change produced a marked improvement.  Whereas before even
> one bad IP
> > address would have a severe impact now using four bad ip addresses makes
> > only a small impact on the initial time it takes to join the domain.
> > Looking at ethereal traces the behavior is exactly the same as
> before, only
> > it goes through the list of DCs much faster.
> >
> > As an additional measure I also modified the function
> get_dc_list as it goes
> > through the list of DCs copying them to the user buffer.
> Before copying an
> > entry I added a call to check_negative_conn_cache() and if the
> DC is in the
> > failed connection cache it is not added to the DC list.  These
> cache entries
> > go stale after 30 seconds so a DC should have a chance to
> 'redeem' itself if
> > it was only temporarily unavailable.
> >
> > CONCLUSION:
> > Early results are encouraging, but awaiting some more authentic
> testing by
> > our QA dept.  I just wanted to float this out there in case
> this is of any
> > use to anyone, or if anyone knows of a better solution or sees
> trouble with
> > this one.
> >
> > Thanks all!
> > Joe Meadows
> > Snap Appliance
> >
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ldap_connect_timeout.patch
Type: application/octet-stream
Size: 4690 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20040623/481d3726/ldap_connect_timeout.obj


More information about the samba-technical mailing list