Concering issues others are having with Samba 3.5.10+ and group membership behavior when joined to AD as member servers
jra at samba.org
Mon Apr 16 18:10:30 MDT 2012
On Mon, Apr 16, 2012 at 11:47:35PM +0000, Goldberg, Neil R. wrote:
> Recently we've noticed (and Bart Janssens has written to the list most recently about this, " Re: [Samba] Samba 3.6.4 on Solaris - groups for user Inconsistent") that there are inconsistencies in group membership for users based on context.
> This was made evident since the patch that was accepted back in late 2011 that removed the Update Sequence Number as the main discriminator for expiring Winbind cache entries. Now after 5 minutes (the default) the Winbind cache expires and these problems are immediately obvious.
> What we were able to determine was that if a user logs into a system by way of PAM using the winbind module, or through a network logon via smbd, then the group list returned after the Winbind cache expires would be missing Domain Local groups (Aliases).
> But if one were to query the membership of a user who has not logged into the system at all, then the group list returned would always contain all the groups, both Global/Universal and Domain Local.
> But as soon as that user were to log into the system, after the Winbind cache expires the groups become truncated for that user.
> We found that it was because pam_winbind and the auth subsystem of smbd populates the netsamlogon_cache.tdb with information from netr_saminfo3.
> But this information is only queried by Winbind in answer to either query_user or lookup_groups.
> In particular, the winbind MSRPC and ADS backends specifically check the netsamlogon_cache when the winbind cache fails. And as of this time, netsamlogon_cache entries NEVER EXPIRE. Which means that group membership on the domain can change, and a user can logout of a system, and subsequent queries for group membership will always return stale, incomplete information.
> And in particular, the interpretation of the "extra SIDs" from the structure is different when parsed from query_user as opposed to lookup_groups, which is why the group list appears truncated.
> For some reason, some time ago a change was made that added an extra Boolean parameter to "sid_array_from_info3" to determine if groups of type SE_GROUP_RESOURCE (i.e. Domain Local groups) should be added to the SID array returned. And this parameter is only used in lookup_groups, which is ultimately how winbindd responds to NSS winbind requests for initgroups_dyn (i.e. getgroups).
> There was a claim about SID filtering, which doesn't make sense because that's something implemented on Domain Controllers to deal with domains they don't fully trust but have trust relationships with, and it isn't even recommended because it breaks transitive trust.
> In any case domain controllers running at 2003 functional level always return Domain Local groups as Extra SIDs instead of RIDs in the NETINFO3 response to RPC calls on SAMR pipe. So filtering by that type just removes those groups entirely, but only in that context.
> Theoretically (according to a comment) it adds them back in by looping back to MSRPC and querying for aliases on the domain, but there is a bug in this logic, as it only checks the machine account and BUILTIN account domains and omits the joined domain entirely. We believe that winbind caching had been masking this problem until recently.
> So we made a patch that does several things.
> 1) Re-add the timeout for the netsamlogon_cache, deleting entries that are stale upon fetch (currently piggy-backing on the winbind timeout parameter). Apparently there was once an issue retrieving group lists as complete as the SAMR RPC calls could provide motivating the non-expiration, but in testing we believe those issues have since been addressed.
> 2) Remove the resource group (domain local) group filtering from the sid_array_from_info3 function, as it was ill-advised and of questionable use.
> 3) Remove the reliance on the ADS path of winbind on netsamlogon_cache and have it always query LDAP if the winbind cache is expired.
> It fixed our problem. Now our group lists are always the same, whether they hit the Winbind cache or not.
> We did not address the potential bug (not correctly querying the domain for aliases) in the lookup_aliases portion of the MSRPC path in Winbind, because we do not understand it well enough.
> We have a patch for samba-3.5.10 (which is of interest only to users of RHEL 5.x) which we can provide, or we can apply them to any current clean release.
> We welcome any feedback about our approach, assumptions, and conclusions.
Wow ! Very comprehensive analysis of the problem !
We would love to see your patches with explainations,
to analyze for a future Samba release.
Thanks very much for spending the time to do this !
More information about the samba-technical