[PATCH] libads: fixes to generation of custom krb5.conf

Mon Jun 8 11:31:06 MDT 2015

On Mon, Jun 08, 2015 at 08:59:49AM +0300, Uri Simchoni wrote:
> 
> The internal_resolve_name() call may return a cached result. However,
> the cache key is <domain,name type> tuple, not <site,domain,name
> type>, so calling internal_resolve_name with/without a site may yield
> the same result. "Luckily", in the case of Kerberos, name resolving
> results are NOT cached, so calling internal_resolve_name with a
> kerberos name type always yields a correct result.
> 
> Incidentally - this caching scheme may seem like a bug. It certainly
> allows bugs to creep in (as any caching scheme would - once you
> duplicate state you call for bugs), but notice that in the case of
> LDAP, a bug is avoided by clearing the cache when moving from a
> site-specific search to site-less search (well, maybe not entirely
> avoided - there could be race conditions).
> 
> > If it can't be really fixed I would still prefer that we get first the KDC
> > from the local site and then other KDC, I know products that are kerberos
> > heavy and I would like to avoid those products querying the KDC at the other
> > side of the globe if one is available nearby.
> >
> Patch #5 ensures that, the server from server affinity cache is always
> first. Usually this is a server from the local site. The rest of the
> servers are the ones which answered first to the CLDAP ping (up to 3
> of them, and with a clear bias towards on-site servers). Hopefully
> their ability to answer first also means they would give best service.
> 
> I have to admit I was focused on branch setups, in which a single
> on-site DC is being backed by off-site DCs. In that case, it's
> important to get the on-site DC first, and also to get some off-site
> DCs as backup, but the order between them is not important. In a HQ
> setup, we might have a bunch of DCs on site so we never want to go
> off-site (i.e. the chance that all on-site servers would fail is so
> small that we don't want any chance of shooting ourselves in the leg
> by going off-site). I'm not entirely sure how to balance those two
> requirements - I hope the current measures are enough:
> 
> 1. The "last known good server" from affinity cache is always first.
> This is intentionally an on-site server (or closest-site server,
> sometimes there's no server on-site).
> 2. The CLDAP pings for on-site servers are sent before pings to
> off-site servers (if more than 3 CLDAP pings in total - there's even a
> 100ms delay after first three).
> 3. On-site servers are supposed to answer more quickly.
> 4. When analyzing the responses, the kdcs which answered are listed in
> krb5.conf according to the dns query order, which prefers on-site
> servers over off-site.
> 
> To summarize:
> - Old behavior - look only at on-site servers, do not prefer
> session-affinity server over others
> - New behavior - allow off-site servers (with bias towards on-site
> servers), session-affinity server always first.
> 
> What do you think? does it look like a reasonable balance? Do note
> that the intent to add off-site servers to the list has been there all
> along, it just didn't work properly.

+2 from me. This is the way it was always supposed to work
as I recall.

> > Also why the first call to internal_resolve_name (line 3119) is not using
> > auto_name_type ?
> >
> 
> My mistake - it's dead code (look a few lines above - there's
> absolutely no way for it to execute). I removed in another patch, but
> did some re-ordering the patches for submission to the list so it
> crept in. The fallback to SRV query is made in line 3164, so this can
> be safely deleted.
> 
> Attached is a corrected patch set with added patch that removes this
> dead code. Nothing else is changed. I didn't add Jeremy's "reviewed
> by" because of this added patch.

Reviewed-by: Jeremy Allison <jra at samba.org>

Matthieu, are you happy with the explaination ?

Cheers,

	Jeremy.