[PATCH] Finally run bind9_dlz spnego test, fix drs delete behaviour

Andrew Bartlett abartlet at samba.org
Mon Jun 10 03:40:53 MDT 2013


On Mon, 2013-06-10 at 10:55 +0200, Stefan (metze) Metzmacher wrote:
> Am 10.06.2013 10:33, schrieb Andrew Bartlett:
> > On Mon, 2013-06-10 at 16:42 +1000, Andrew Bartlett wrote:
> >> On Sun, 2013-06-09 at 20:28 +1000, Andrew Bartlett wrote:
> >>> On Sat, 2013-06-08 at 22:20 +1000, Andrew Bartlett wrote:
> >>>> On Tue, 2013-06-04 at 22:03 +1000, Andrew Bartlett wrote:
> >>>>> On Tue, 2013-06-04 at 16:39 +1000, Andrew Bartlett wrote:
> >>>>>> On Mon, 2013-06-03 at 22:27 +1000, Andrew Bartlett wrote:
> >>>>>>> On Sun, 2013-06-02 at 23:05 +1000, Andrew Bartlett wrote:
> >>>>>>>> I've been frustrated for over 6 months by why adding some 'simple' tests
> >>>>>>>> to confirm that some of the crypto in the bind9_dlz code works because
> >>>>>>>> it suddenly broke make test, particularly dbcheck.
> >>>>>>>>
> >>>>>>>> The attached patches just passed a private autobuild.  They add the
> >>>>>>>> 'problem' tests, but first we fix the behaviour of DRS-initiated object
> >>>>>>>> deletes.
> >>>>>>>>
> >>>>>>>> Please review/push/comment (this patch series includes the usnChanged
> >>>>>>>> series I posted a few days ago). 
> >>>>>>>>
> >>>>>>>> >From here, I would like to continue to improve the tests - the tests in
> >>>>>>>> source4/torture/drs/python/delete_object.py could be trivially extended
> >>>>>>>> to add a 'description' and 'memberOf' element that we should ensure gets
> >>>>>>>> deleted on both hosts, for example.  We could also watch usnChanged
> >>>>>>>> values to ensure we delete the right stuff, but for now I'm simply
> >>>>>>>> stunned that this could ever have worked with this incorrect!
> >>>>>>>
> >>>>>>> Just as a heads-up I'm continuing to work on these patches.  The point
> >>>>>>> tests I added (rather than just waiting for the dbcheck) show the issue
> >>>>>>> isn't totally resolved, but is better.  (I somehow found a
> >>>>>>> member/memberOf link left over...).
> >>>>>>>
> >>>>>>> Review of this much would be helpful, but expect additional changes as
> >>>>>>> we finally start to get this right.  
> >>>>>>
> >>>>>> I've not finished the patch yet, but what seems clear is that the issue
> >>>>>> comes from processing (rather that dropping/ignoring, as we should)
> >>>>>> linked attributes and to deleted objects. 
> >>>>>
> >>>>> I'm almost shocked to finally have this finished, given how long this
> >>>>> problem has dogged me.  The patches are in my fix-drs-testing-14 branch,
> >>>>> and attached.
> >>>>>
> >>>>> Not only does this open up the chance to do more DRS testing, and more
> >>>>> unrelated fixes to DRS replication (now that adding tests does not
> >>>>> suddenly cause 'unrelated' breakages), it also allows us to resume
> >>>>> adding tests of the bind9 DLZ module, which stalled out when adding
> >>>>> bind9 tests broke stuff.
> >>>>>
> >>>>> The patches handle both normal and linked attributes, following all the
> >>>>> special rules for deleted objects. 
> >>>>
> >>>> I've worked with metze on this, and I'm up to patch set #19...
> >>>>
> >>>> fix-drs-testing-19 is the patch set I'm happy with.  It passed an
> >>>> autobuild for me except for the unexpected success of a dbcheck test.
> >>>> I've now squashed in the knownfail removal, and am doing two autobuilds
> >>>> to see how it goes. 
> >>>
> >>> These failed, but I think it's again 'in a good way'.  That is, the
> >>> failure is because we now replicate more things properly, and so we have
> >>> a repsFrom that points to a deleted DC.  The server code gives an error
> >>> in this case, as it can't find a non-deleted DN for the GUID.  We should
> >>> either return success with a NULL DN, or show the deleted DN, I'll try
> >>> and induce the same behaviour and check with windows.
> >>
> >> G'Day Metze,
> >>
> >> I've fixed that issue, and I've got one successful private autobuild.
> >> I'm running a second one now.
> >>
> >> The branch that I think finally is ready for master if
> >> fix-drs-testing-19.  It is naturally based on the branch we were working
> >> together on and which you were running your own private builds on.
> > 
> > I think we have some serious issues with our schema handling, which we
> > expose more of as we fix the other issues.  
> > 
> > The latest build on that branch failed with the error below, but it also
> > worries me that we get this earlier:
> > 
> > Failed to convert objects:
> > WERR_DS_DRA_SCHEMA_MISMATCH/NT_STATUS_INVALID_NETWORK_RESPONSE
> > Failed to apply records: linked_attributes_add: attribute instanceType
> > is not a valid attribute in schema: Object class violation
> > 
> > The idea that instanceType can ever not be in the schema is implausible,
> > and suggests our schema handling is quite broken.
> > 
> > [1576/1582 in 1h41m6s] samba4.blackbox.samba_tool_demote(promoted_dc)
> > Using localdc as partner server for the demotion
> > Desactivating inbound replication
> > Asking partner server localdc to synchronize from us
> > Error while demoting, re-enabling inbound replication
> > ERROR(<class 'samba.drs_utils.drsException'>): Error while sending a
> > DsReplicaSync for partion
> > CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com - drsException:
> > DsReplicaSync failed (8442, 'WERR_DS_DRA_INTERNAL_ERROR') 
> >   File "bin/python/samba/netcmd/domain.py", line 647, in run
> >     sendDsReplicaSync(drsuapiBind, drsuapi_handle, ntds_guid, str(part),
> > drsuapi.DRSUAPI_DRS_WRIT_REP)
> >   File "bin/python/samba/drs_utils.py", line 83, in sendDsReplicaSync
> >     raise drsException("DsReplicaSync failed %s" % estr)
> > UNEXPECTED(failure):
> > samba4.blackbox.samba_tool_demote(promoted_dc).demote(promoted_dc)
> > REASON: _StringException: _StringException: No reason specified
> > 
> > FAILED (1 failures, 0 errors and 0 unexpected successes in 0 testsuites)
> > 
> > A summary with detailed information can be found in:
> >   ./bin/ab/summary
> > 
> > ==> samba.stderr <==
> > ../source4/rpc_server/drsuapi/getncchanges.c:259: Failed to find
> > attribute in schema for attrid 5963789 mentioned in replPropertyMetaData
> > of
> > CN=DrsReplSchema-1370848125-3-cls-B,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
> 
> That's the flackey test that started with commit
> e24fe5705e3c4d33705ebb584ea2009bb4a1a82c.
> 
> I read through the patches again and found 2 problems, which should be
> fixed with the following
> commits:
> 
> dsdb: use the correct talloc parent in dsdb_repl_merge_working_schema()
> https://gitweb.samba.org/?p=metze/samba/wip.git;a=commitdiff;h=e6d427afa63532169c0068e06e5d99686f43b7e3
> and
> dsdb: reset schema->{classes,attributes}_to_remove_size to 0
> https://gitweb.samba.org/?p=metze/samba/wip.git;a=commitdiff;h=15da0df73fb99b2d52094e8165e3c941e7b23034
> 
> I think both could explain a corrupted schema.

I certainly think they could.  You can can consider both
Reviewed-by: Andrew Bartlett <abartlet at samba.org> at the appropriate
time.  

Thank you very much for your patience and persistence here.  Each time
we fix one bug we find another, and I was dreading the task of digging
into our schema handling just to get these merged. 

Andrew Bartlett

-- 
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org




More information about the samba-technical mailing list