LDAP delay when connected DC goes down
jameadows at webopolis.com
Mon Jan 10 20:05:52 GMT 2005
A couple of us have been looking into a problem that shows up when a
Samba server is joined to an ADS domain and the DC that it is connected
to becomes unreachable. We were seeing a long delay (sixteen minutes)
before the server would time out and failover to another DC. We tracked
the delay down to the function ads_do_paged_search() which is calling
ldap_search_ext_s() without setting a timeout. I've noticed that the
function ads_do_search() also calls ldap_search_ext_s(), but in this
case it is given a timeout of ADS_SEARCH_TIMEOUT (10 seconds).
We modified the code so that ads_do_paged_search() also sets a timeout
when calling ldap_search_ext_s(). In this case the timeout is set to
lp_ldap_timeout(), which is set by the 'ldap timeout' parameter in
smb.conf or defaults to 15 seconds. With this change in place our
testing shows that the problem is effectively fixed. Samba times out
after the specified time and reattaches to the domain using a different
DC. I am curious if not setting the timeout in ads_do_paged_search()
was done intentionally and if there is any reason why a timeout here
would be a bad thing. Our testing is focused on this one problem that
we're running into and the timeout does seem to fix it, but will this
change create problems elsewhere?
BTW, the 'ldap timeout' in smb.conf currently only controls the
*connect* timeout and has no effect on the search timeout. I also tried
to set the search timeout using ldap_set_option but this did not work.
The reason turns out to be that for ldap_search_ext_s() the timeout only
controls the amount of time that the server will spend searching, but
does not affect the local timeout (the time that the client will wait
for the results of the search).
This testing was done with Samba running on an RH9 and an FC2 system
with the same results. I'm not doing the actual testing, just making
the software tweaks, and my partner is testing the results on his
network. So far he has found that this change has made a marked
improvement for this problem but that there is still a problem when
printing to a CUPS queue on his Samba server. I have absolutely no
experience with CUPS myself and haven't looked at this yet. I don't
know what the relationship between CUPS and Samba is or if this problem
might be in CUPS code and not caused by anything that Samba is doing (or
not doing). Anyone have any ideas on this part?
Thanks in advance,
More information about the samba-technical