patch set for kcc topology comparison

tridge at tridge at
Thu Oct 13 00:07:47 MDT 2011

Hi Dave,

 > 1) End user can analyse the ldif records to ensure it doesn't
 >     contain anything deemed secret or confidential to the user.
 >     Given the records we are looking for that probably isn't a
 >     concern but since the records are readable by the user that
 >     gives the end-user extra confidence.


 > 2) Developer can utilize records to rerun algorithms to debug and furthermore
 >     include records that produced algorithm anomalies into a testcase
 > that we can
 >     have for future reference.


 > We may want to consider a slightly more expansive view of this problem/solution
 > as we probably have similar issues with algorithms beyond the KCC.   In
 > general I think it might be good to have one (or multiple distinct)
 > scripts that extract
 > a simplified set of ldif records from the customer database.   In the
 > KCC algorithm those records
 > are noted below.

ok, good point

 > Furthermore instead of going directly from ldif to ldb_result (and
 > having to modify the KCC to take ldb_result messages in its core
 > algorithm) we could just reproduce a simplified sam ldb database
 > from the end-users ldif files.  In that manner the KCC code would
 > not really have to change that much and could continue to perform
 > ldb_search() and the like within it.  We'd provably need a simple
 > rootDSE record as well since the ldb_context wants to save the
 > schema/config/default dn in its opaque attributes when this
 > simplified samdb would be opened.


 > I ran a quick test on tridge's records to see if I could produce a
 > samdb that was significantly abbreviated but would work.  I
 > attached the example python hack to this note

I'm glad to see you doing python code! another convert to the
wonderful world of python? ;-)

 > So here's what I think are inputs to the KCC algorithm:
 > Each naming context is taken in turn and connections / reps are
 > calculated individually.  The algorithms may prune connections if
 > there are too many but the at least one run per algorithm (intra /
 > inter) is devoted to each individual NC.  Thus.


 > 1) need all the naming contexts (including application NCs)
 > So to determine if you have NCs for other domains or application
 > NCs need to understand if the NC has an objectSid (application NCs
 > don't have one).  Note that application NCs have to be identfied in
 > the algorithms because they have modified topology calculation
 > rules.

I think the easiest way to work out if the NC has an objectSID is to
fetch the link DNs from the NTDSDSA objects using the extended_dn
control. That will give you the DN strings for all the NCs, plus it
will give you their GUIDs and SIDs (that is why kcc_dump uses the
extended_dn control).

so just use ldb_dn_get_extended_component(dn, "SID") and see if that
returns NULL or not.

 > 2) Then for each NC we need to do something like
 > ldbsearch -H ldap://n1lin1 -U Administrator%p at ssw0rd -b
 > "DC=ad1,DC=wimberosa,DC=net" -s base
 > for each NC

I think we need to query the individual partitions for the existing
repsFrom attributes, but I don't think we need to look there for any
other attributes. The reason I think it may be better to get the
information from the NTDSDSA object is that we know this will always
be available, whereas the NC partitions themselves may not be
instantiated yet (or the target DC may not be a global catalog, so may
never get this partition).

 > In addition to that ojbectSid we'll also need that "repsFrom" and
 > "repsTo" attribute and blob so may
 > as well get the whole object under the dn if easy.

I'd recommend just grabbing the repsFrom and (if needed repsTo). The
domain partitions are the ones that often contain potentially company
sensitive information, even information on the password policy of the
company (stored in those objects) could be considered sensitive. So I
think we should get the minimum from those NCs.

 > 3) Next we will need all the nTDSDSA objects as we will need
 > "objectGUID", "options", "invocationID",
 >     "msDS-hasMasterNCs", "hasMasterNCs", "msDS-isRODC",
 > "msDS-hasFullReplicaNCs",
 >     "hasPartialReplicaNCs', "msDS-HasInstantiatedNCs".   So getting
 > the whole object for each
 >      nTDSDSA is important.

yep, probably getting the whole NTDSDSA is OK

 > 4) Under each NC dn we will also need the crossRef object under it.
 > We have to compare the "nCName"
 >     attributes of the crossRef object to see if it is a cross
 > reference for each NC we are examining.
 >     If it is then we consult the "msDS-NC-Replica-Locations" and
 > "msDS-NC-RO-Replica-Locations"
 >     attributes for the crossRef to see if it enumerates the nTDSDSA we
 > are currently computing a topology
 >     for.  If it matches then we should have a replica for the NC on
 > the DC we are computing topology for.
 >     Bottom line we need the crossRef objects under each partition
 >     dn

yep, getting the crossRef objects from the configuration partition is
fine I think

 > 5) Then under our nTDSDSA object (identified by dsSeriviceName from
 > (1)) we need the nTDSConnection
 >      objects.   We'll need the objectGUID, fromServer and options
 > minimally but should get the whole set
 >      of connection objects if easy.

ok, those should be easy to add to kcc_dump

I also talked a bit more about this problem with Andrew Bartlett
today. As we discussed earlier, there seem to be 4 different
approaches we have thought of:

1) the "input is a ldb_result" option that I previously suggested. I
   agree with you that this doesn't generalise very well

2) the "schema-less ldb" approach. This is the one you are currently
   looking at. This should work, but we may hit issues with some
   searches behaving strangely. For example, without a schema the ldb
   won't know what attributes are case sensitive. We can work around
   that by adding some @ATTRIBUTES entries in the fake SAM database,
   so I don't think that is a big problem, but we may run across other
   issues along these lines. We may end up deciding to enable a subset
   of the samdb modules we normally run for a full SAM. The
   extended_dn_in and extended_dn_out modules could be handy, for
   example, to make those controls work correctly. 

3) the "create a full SAM" approach. This would try and create a full
   SAM database, with schema and all modules, using the records
   gathered from the ldif dump. I think this may be quite tricky to
   do. It is probably possible, but probably better to try the
   schema-less approach first

4) do the kcc as a python script (samba_kcc), much like we do
   samba_dnsupdate and samba_spnupdate at the moment. Then incorporate
   the schema-less ldb testing approach above with this script, using
   python hooks to hide any differences that may arise from it not
   being a full SAM.

btw, I know you are more comfortable with C, but if you do feel like a
python adventure then trying the samba_kcc python approach I think
would give us the most maintainable result, as I think python is
perfect for the sorts of topology algorithms we need, and it makes it
really easy to develop and test (just run samba_kcc on the command
line). It is up to you though, and if you really don't feel
comfortable with doing it in python then option 2 above is probably
the best choice.

Cheers, Tridge

More information about the samba-technical mailing list