LDAP delay when connected DC goes down

Joe Meadows jameadows at webopolis.com
Mon Jan 10 20:05:52 GMT 2005

Hi all,

A couple of us have been looking into a problem that shows up when a 
Samba server is joined to an ADS domain and the DC that it is connected 
to becomes unreachable.  We were seeing a long delay (sixteen minutes) 
before the server would time out and failover to another DC.  We tracked 
the delay down to the function ads_do_paged_search() which is calling 
ldap_search_ext_s() without setting a timeout.  I've noticed that the 
function ads_do_search() also calls ldap_search_ext_s(), but in this 
case it is given a timeout of ADS_SEARCH_TIMEOUT (10 seconds).

We modified the code so that ads_do_paged_search() also sets a timeout 
when calling ldap_search_ext_s().  In this case the timeout is set to 
lp_ldap_timeout(), which is set by the 'ldap timeout' parameter in 
smb.conf or defaults to 15 seconds.  With this change in place our 
testing shows that the problem is effectively fixed.  Samba times out 
after the specified time and reattaches to the domain using a different 
DC.  I am curious if not setting the timeout in ads_do_paged_search() 
was done intentionally and if there is any reason why a timeout here 
would be a bad thing.  Our testing is focused on this one problem that 
we're running into and the timeout does seem to fix it, but will this 
change create problems elsewhere?

BTW, the 'ldap timeout' in smb.conf currently only controls the 
*connect* timeout and has no effect on the search timeout.  I also tried 
to set the search timeout using ldap_set_option but this did not work.  
The reason turns out to be that for ldap_search_ext_s() the timeout only 
controls the amount of time that the server will spend searching, but 
does not affect the local timeout (the time that the client will wait 
for the results of the search).

This testing was done with Samba running on an RH9 and an FC2 system 
with the same results.  I'm not doing the actual testing, just making 
the software tweaks, and my partner is testing the results on his 
network.  So far he has found that this change has made a marked 
improvement for this problem but that there is still a problem when 
printing to a CUPS queue on his Samba server.  I have absolutely no 
experience with CUPS myself and haven't looked at this yet.  I don't 
know what the relationship between CUPS and Samba is or if this problem 
might be in CUPS code and not caused by anything that Samba is doing (or 
not doing).  Anyone have any ideas on this part?

Thanks in advance,
Joe Meadows

More information about the samba-technical mailing list