[PATCH] Finally run bind9_dlz spnego test, fix drs delete behaviour

Stefan (metze) Metzmacher metze at samba.org
Mon Jun 10 02:55:10 MDT 2013


Am 10.06.2013 10:33, schrieb Andrew Bartlett:
> On Mon, 2013-06-10 at 16:42 +1000, Andrew Bartlett wrote:
>> On Sun, 2013-06-09 at 20:28 +1000, Andrew Bartlett wrote:
>>> On Sat, 2013-06-08 at 22:20 +1000, Andrew Bartlett wrote:
>>>> On Tue, 2013-06-04 at 22:03 +1000, Andrew Bartlett wrote:
>>>>> On Tue, 2013-06-04 at 16:39 +1000, Andrew Bartlett wrote:
>>>>>> On Mon, 2013-06-03 at 22:27 +1000, Andrew Bartlett wrote:
>>>>>>> On Sun, 2013-06-02 at 23:05 +1000, Andrew Bartlett wrote:
>>>>>>>> I've been frustrated for over 6 months by why adding some 'simple' tests
>>>>>>>> to confirm that some of the crypto in the bind9_dlz code works because
>>>>>>>> it suddenly broke make test, particularly dbcheck.
>>>>>>>>
>>>>>>>> The attached patches just passed a private autobuild.  They add the
>>>>>>>> 'problem' tests, but first we fix the behaviour of DRS-initiated object
>>>>>>>> deletes.
>>>>>>>>
>>>>>>>> Please review/push/comment (this patch series includes the usnChanged
>>>>>>>> series I posted a few days ago). 
>>>>>>>>
>>>>>>>> >From here, I would like to continue to improve the tests - the tests in
>>>>>>>> source4/torture/drs/python/delete_object.py could be trivially extended
>>>>>>>> to add a 'description' and 'memberOf' element that we should ensure gets
>>>>>>>> deleted on both hosts, for example.  We could also watch usnChanged
>>>>>>>> values to ensure we delete the right stuff, but for now I'm simply
>>>>>>>> stunned that this could ever have worked with this incorrect!
>>>>>>>
>>>>>>> Just as a heads-up I'm continuing to work on these patches.  The point
>>>>>>> tests I added (rather than just waiting for the dbcheck) show the issue
>>>>>>> isn't totally resolved, but is better.  (I somehow found a
>>>>>>> member/memberOf link left over...).
>>>>>>>
>>>>>>> Review of this much would be helpful, but expect additional changes as
>>>>>>> we finally start to get this right.  
>>>>>>
>>>>>> I've not finished the patch yet, but what seems clear is that the issue
>>>>>> comes from processing (rather that dropping/ignoring, as we should)
>>>>>> linked attributes and to deleted objects. 
>>>>>
>>>>> I'm almost shocked to finally have this finished, given how long this
>>>>> problem has dogged me.  The patches are in my fix-drs-testing-14 branch,
>>>>> and attached.
>>>>>
>>>>> Not only does this open up the chance to do more DRS testing, and more
>>>>> unrelated fixes to DRS replication (now that adding tests does not
>>>>> suddenly cause 'unrelated' breakages), it also allows us to resume
>>>>> adding tests of the bind9 DLZ module, which stalled out when adding
>>>>> bind9 tests broke stuff.
>>>>>
>>>>> The patches handle both normal and linked attributes, following all the
>>>>> special rules for deleted objects. 
>>>>
>>>> I've worked with metze on this, and I'm up to patch set #19...
>>>>
>>>> fix-drs-testing-19 is the patch set I'm happy with.  It passed an
>>>> autobuild for me except for the unexpected success of a dbcheck test.
>>>> I've now squashed in the knownfail removal, and am doing two autobuilds
>>>> to see how it goes. 
>>>
>>> These failed, but I think it's again 'in a good way'.  That is, the
>>> failure is because we now replicate more things properly, and so we have
>>> a repsFrom that points to a deleted DC.  The server code gives an error
>>> in this case, as it can't find a non-deleted DN for the GUID.  We should
>>> either return success with a NULL DN, or show the deleted DN, I'll try
>>> and induce the same behaviour and check with windows.
>>
>> G'Day Metze,
>>
>> I've fixed that issue, and I've got one successful private autobuild.
>> I'm running a second one now.
>>
>> The branch that I think finally is ready for master if
>> fix-drs-testing-19.  It is naturally based on the branch we were working
>> together on and which you were running your own private builds on.
> 
> I think we have some serious issues with our schema handling, which we
> expose more of as we fix the other issues.  
> 
> The latest build on that branch failed with the error below, but it also
> worries me that we get this earlier:
> 
> Failed to convert objects:
> WERR_DS_DRA_SCHEMA_MISMATCH/NT_STATUS_INVALID_NETWORK_RESPONSE
> Failed to apply records: linked_attributes_add: attribute instanceType
> is not a valid attribute in schema: Object class violation
> 
> The idea that instanceType can ever not be in the schema is implausible,
> and suggests our schema handling is quite broken.
> 
> [1576/1582 in 1h41m6s] samba4.blackbox.samba_tool_demote(promoted_dc)
> Using localdc as partner server for the demotion
> Desactivating inbound replication
> Asking partner server localdc to synchronize from us
> Error while demoting, re-enabling inbound replication
> ERROR(<class 'samba.drs_utils.drsException'>): Error while sending a
> DsReplicaSync for partion
> CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com - drsException:
> DsReplicaSync failed (8442, 'WERR_DS_DRA_INTERNAL_ERROR') 
>   File "bin/python/samba/netcmd/domain.py", line 647, in run
>     sendDsReplicaSync(drsuapiBind, drsuapi_handle, ntds_guid, str(part),
> drsuapi.DRSUAPI_DRS_WRIT_REP)
>   File "bin/python/samba/drs_utils.py", line 83, in sendDsReplicaSync
>     raise drsException("DsReplicaSync failed %s" % estr)
> UNEXPECTED(failure):
> samba4.blackbox.samba_tool_demote(promoted_dc).demote(promoted_dc)
> REASON: _StringException: _StringException: No reason specified
> 
> FAILED (1 failures, 0 errors and 0 unexpected successes in 0 testsuites)
> 
> A summary with detailed information can be found in:
>   ./bin/ab/summary
> 
> ==> samba.stderr <==
> ../source4/rpc_server/drsuapi/getncchanges.c:259: Failed to find
> attribute in schema for attrid 5963789 mentioned in replPropertyMetaData
> of
> CN=DrsReplSchema-1370848125-3-cls-B,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com

That's the flackey test that started with commit
e24fe5705e3c4d33705ebb584ea2009bb4a1a82c.

I read through the patches again and found 2 problems, which should be
fixed with the following
commits:

dsdb: use the correct talloc parent in dsdb_repl_merge_working_schema()
https://gitweb.samba.org/?p=metze/samba/wip.git;a=commitdiff;h=e6d427afa63532169c0068e06e5d99686f43b7e3
and
dsdb: reset schema->{classes,attributes}_to_remove_size to 0
https://gitweb.samba.org/?p=metze/samba/wip.git;a=commitdiff;h=15da0df73fb99b2d52094e8165e3c941e7b23034

I think both could explain a corrupted schema.

metze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20130610/de4416d9/attachment.pgp>


More information about the samba-technical mailing list