Guidelines about fixing bugs
uri at samba.org
Thu Feb 2 20:15:06 UTC 2017
On 02/01/2017 12:22 PM, Andreas Schneider wrote:
> Uri has been working in this area lately. I've discovered this issue during a
> 'net ads join' too. We get the list of DCs (e.g. 200) and then we resolve each
> DC name to an ip address. Depending on the network this can take several
> minutes. We only need one IP to resolve, we normally only talk to one in the
> list. Fixing this means to rewrite the whole logic and make sure everything is
> Maybe Uri can give some more pointers, he implemented site support.
Well, hardly working, more like looking, and then caught by other tasks :)
In my offline discussions with Andreas we observed that the main problem
is that it resolves all the names in the SRV records - this is both
wasteful and time consuming. Even if you do it in parallel, it will
remain wasteful. There are orgs with more than 100 DCs in them, and the
SRV query returns all of them (the non-site-aware query). Normally we
don't need all those addresses, because we try the site-local ones first.
Copy+paste of my analysis from then:
- If the SRV query returns the IP addresses as additional records, the
code leverages that and avoids the extra A queries
- Still, looking at packet captures of large enterprises, we don't get
the additional records so this clearly has to be dealt with.
- My general idea is to turn the API from one that returns an IP list
into one that returns an iterator-style, opaque object that you can
extract addresses from until you run out of addresses. In the
implementation, the A queries will thus be done lazily and only if
needed (keep leveraging the SRV query additional records). As we make A
queries we filter answers that have been provided by previous queries.
- I was thinking of re-writing internal_resolve_name() in terms of the
new API, then work my way upwards until nobody uses internal_resolve_name()
- Should be relatively lp_ agnostic and db_ agnostic - only db is the
name cache (with server affinity and negative cache done at a higher
level - this is strictly name resolution).
- Should support supplying IPv4 addresses before IPv6 (compatibility
with today's code)
- Should support adding entries manually to the result set (to support
- Won't be 100% compatible - won't be able to support the current method
of spraying CLDAPs simultaneously to all DCs to see who responds first.
We can work in groups of N each time.
- Should support returning next N addresses instead of next 1. That
would allow us to later enhance the implementation to leverage the async
- Sync vs async - an obvious question is whether to have an async API
below the sync API. The current use case is sync. Beside general added
complexity, the missing piece is async SRV record query - I suppose we
can copy the async getaddrinfo (getaddrinfo_send()) and do the same. The
getaddrinfo_send() is not used anywhere, so we should expect bugs.
- Haven't looked at how to test that - would resolv_wrapper do the job?
(i.e. does it return only SRV records without the associated A records)
I hope that helps,
More information about the samba-technical