Guidelines about fixing bugs

Thu Feb 2 23:46:19 UTC 2017

On 02/02/2017 12:15 PM, Uri Simchoni wrote:
> On 02/01/2017 12:22 PM, Andreas Schneider wrote:
>> Uri has been working in this area lately. I've discovered this issue during a 
>> 'net ads join' too. We get the list of DCs (e.g. 200) and then we resolve each 
>> DC name to an ip address. Depending on the network this can take several 
>> minutes. We only need one IP to resolve, we normally only talk to one in the 
>> list. Fixing this means to rewrite the whole logic and make sure everything is 
>> site-aware.
>>
>> Maybe Uri can give some more pointers, he implemented site support.
> Well, hardly working, more like looking, and then caught by other tasks :)
>
> In my offline discussions with Andreas we observed that the main problem
> is that it resolves all the names in the SRV records - this is both
> wasteful and time consuming. Even if you do it in parallel, it will
> remain wasteful. There are orgs with more than 100 DCs in them, and the
> SRV query returns all of them (the non-site-aware query). Normally we
> don't need all those addresses, because we try the site-local ones first.
>
> Copy+paste of my analysis from then:
>
> - If the SRV query returns the IP addresses as additional records, the
> code leverages that and avoids the extra A queries

> - Still, looking at packet captures of large enterprises, we don't get
> the additional records so this clearly has to be dealt with.
I think it depends on the DNS servers, but it could also depends on the
number of names in the SRV records
> - My general idea is to turn the API from one that returns an IP list
> into one that returns an iterator-style, opaque object that you can
> extract addresses from until you run out of addresses. In the
> implementation, the A queries will thus be done lazily and only if
> needed (keep leveraging the SRV query additional records). As we make A
> queries we filter answers that have been provided by previous queries.
We could change struct ip_service to have the name in addition to
sockaddr_storage, so that if the sockaddr_storage is not populated when
we need it we could still populate it.
> - I was thinking of re-writing internal_resolve_name() in terms of the
> new API, then work my way upwards until nobody uses internal_resolve_name()
> - Should be relatively lp_ agnostic and db_ agnostic - only db is the
> name cache (with server affinity and negative cache done at a higher
> level - this is strictly name resolution).
> - Should support supplying IPv4 addresses before IPv6 (compatibility
> with today's code)
> - Should support adding entries manually to the result set (to support
> server affinity)
> - Won't be 100% compatible - won't be able to support the current method
> of spraying CLDAPs simultaneously to all DCs to see who responds first.
> We can work in groups of N each time.
That's the major challenge, I haven't looked at all the code but just
looking at get_kdc_ip_string we first lookup all the IPs, then create
all the tsocket and then start to spray CLDAPs.
All the code that is dealing with results from internal_resolve_name
needs to be changed to deal with the lazy loading of records.
> - Should support returning next N addresses instead of next 1. That
> would allow us to later enhance the implementation to leverage the async
> geteaddrinfo_send() API.
I'm not sure I get this point.
> - Sync vs async - an obvious question is whether to have an async API
> below the sync API. The current use case is sync. Beside general added
> complexity, the missing piece is async SRV record query - I suppose we
> can copy the async getaddrinfo (getaddrinfo_send()) and do the same. The
> getaddrinfo_send() is not used anywhere, so we should expect bugs.

> - Haven't looked at how to test that - would resolv_wrapper do the job?
> (i.e. does it return only SRV records without the associated A records)
>
> I hope that helps,
> Uri.
>
>
-- 

Matthieu Patou
Samba Team
http://samba.org