Massively excessive DNS lookups in ads_XXXX code.

Jeremy Allison jra at samba.org
Thu Jul 23 17:00:47 UTC 2020


On Thu, Jul 23, 2020 at 07:22:57AM +0200, Andreas Schneider wrote:
> On Thursday, 23 July 2020 00:29:45 CEST Jeremy Allison via samba-technical 
> wrote:
> > Hi all,
> 
> Hi Jeremy,
> 
> > Do you concur ?
> 
> Yep :-)
> 
> Have you ever called 'net ads join' in a big network with 200 DCs? Once we get 
> the list with DCs we will resolve *every* name we got to an IP address. Yes, 
> one would be enough. If we run into timeouts (30sec) a 'net ads join' takes 
> about 5-10 minutes to succeed.
> 
> Take a look at the while loop in discover_dc_dns() :-)
> 
> 
> Thanks for looking into this! ;-)

Do I have a patchset for you ! :-) :-).

It isn't actually the large network
(200 DC's) that is the problem, my
customer has more.

The problem is that we aren't doing
the DNS lookups in parallel, and are
waiting for all requests to return
before returning.

I have a 3-element patchset.

1). Preparatory work to clean up a lot of the
libsmb/namequery.c code to modern standards.
This already passes CI and I'll try and get
a MR ready. Happy to add you to the review
list.

2). Add the dns_lookup_list_async()
function. This is working and already
being tested on my customer site. It
parallelizes both A and AAAA record
lookups by using lib/addns/dnsquery.c
(with a few new functions). Currently
it has a hard-coded 10 second timeout
on a list of names resolution, but
I'm planning to add a "dns timeout"
parameter (default value 10 sec) that
allows it to be set in smb.conf.

This is the core of the fix - even
on a network with 300+ DC's it sends
all the name requests in parallel and
collects all it can get back in the
timeout period. It's not an error
to timeout. This puts a hard limit
on the total time we spend waiting
for DNS replies, and allow us to
easily pick one at the end.

I have this working for the resolve_ads()
code now, and I'm finishing up fixing
it for discover_dc_dns() (these
functions are very similar). Once
this is done I'll submit an MR
and add you to the review.

3). Fix the fact that the way we
use ADS_STRUCT currently drops
information on servers we just looked
up via DNS + CLDAP and for subsequent
operations does the whole DNS + CLDAP
lookups again inside ads_connect().
My patch allows an ADS_STRUCT that
is being re-used that already contains a valid
ads->ldap.ss address to use it without
further DNS or CLDAP lookups. This is
an optimization.

With all these fixes the 'net ads lookup'
'net ads info' commands do *one* set
of DNS lookups, and *one* set of CLDAP
pings - in parallel, and this should
speed up all 'net ads' operations by
an order of magnitide !

Jeremy.



More information about the samba-technical mailing list