autobuild failure due to python replication tests - why ?

Stefan Metzmacher metze at samba.org
Thu Aug 4 06:46:27 UTC 2016


Hi Andrew,

I think I got it.

I the testenv I'm doing:

for i in $(seq 1 2); do
        echo $i;
        PYTHONPATH=bin/python:source4/torture/drs/python DC1=$DC_SERVER
DC2=$VAMPIRE_DC_SERVER source4/scripting/bin/subunitrun repl_schema
-U$DOMAIN/$DC_USERNAME%$DC_PASSWORD || {
                echo "FAIL DC1 DC2: $i"; break;
        } ;
        PYTHONPATH=bin/python:source4/torture/drs/python DC2=$DC_SERVER
DC1=$VAMPIRE_DC_SERVER source4/scripting/bin/subunitrun repl_schema
-U$DOMAIN/$DC_USERNAME%$DC_PASSWORD || {
                echo "FAIL DC2 DC1: $i"; break;
        } ;
        echo "OK: $i";
done

I added some debug messages to dsdb_schema_info_cmp().

I got a loop like this on one dc:

remote[FF00000011E040FDD29D832940A43153F1711A0E3A]
local[FF00000012ED817478D89B2B46AB425900C898C97F]
ts_last_change[Thu Aug  4 08:34:20 2016 CEST], metadata_usn[8]
Can't replicate DC=samba,DC=example,DC=com because remote schema has
changed since we last replicated the schema
Wrong schema when applying reply GetNCChanges, retrying
Replicated 0 objects (0 linked attributes) for
CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
dsdb_schema_info_cmp(WERR_DS_DRA_SCHEMA_MISMATCH):
remote[FF00000011E040FDD29D832940A43153F1711A0E3A]
local[FF00000012ED817478D89B2B46AB425900C898C97F]
ts_last_change[Thu Aug  4 08:34:20 2016 CEST], metadata_usn[8]
Can't replicate DC=samba,DC=example,DC=com because remote schema has
changed since we last replicated the schema
Wrong schema when applying reply GetNCChanges, retrying
Replicated 0 objects (0 linked attributes) for
CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
dsdb_schema_info_cmp(WERR_DS_DRA_SCHEMA_MISMATCH):
remote[FF00000011E040FDD29D832940A43153F1711A0E3A]
local[FF00000012ED817478D89B2B46AB425900C898C97F]
ts_last_change[Thu Aug  4 08:34:20 2016 CEST], metadata_usn[8]
Can't replicate DC=samba,DC=example,DC=com because remote schema has
changed since we last replicated the schema
Wrong schema when applying reply GetNCChanges, retrying
Replicated 0 objects (0 linked attributes) for
CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
dsdb_schema_info_cmp(WERR_DS_DRA_SCHEMA_MISMATCH):
remote[FF00000011E040FDD29D832940A43153F1711A0E3A]
local[FF00000012ED817478D89B2B46AB425900C898C97F]
ts_last_change[Thu Aug  4 08:34:20 2016 CEST], metadata_usn[8]
Can't replicate DC=samba,DC=example,DC=com because remote schema has
changed since we last replicated the schema
Wrong schema when applying reply GetNCChanges, retrying
Replicated 0 objects (0 linked attributes) for
CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
dsdb_schema_info_cmp(WERR_DS_DRA_SCHEMA_MISMATCH):
remote[FF00000011E040FDD29D832940A43153F1711A0E3A]
local[FF00000012ED817478D89B2B46AB425900C898C97F]
ts_last_change[Thu Aug  4 08:34:20 2016 CEST], metadata_usn[8]
Can't replicate DC=samba,DC=example,DC=com because remote schema has
changed since we last replicated the schema
Wrong schema when applying reply GetNCChanges, retrying
Replicated 0 objects (0 linked attributes) for
CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com

I'll try to change dsdb_schema_info_cmp() to check the schema revision,
00000011 vs. 00000012 in this case and only return
WERR_DS_DRA_SCHEMA_MISMATCH
is the remote revision is higher.

I'm also wondering why the schema master changed between revision
00000011 and
00000012, does the repl_schema test transfer the schema master role?

metze

Am 04.08.2016 um 08:12 schrieb Stefan Metzmacher:
> Am 04.08.2016 um 01:48 schrieb Andrew Bartlett:
>> On Wed, 2016-08-03 at 15:46 +0200, Stefan Metzmacher wrote:
>>> Hi Andrew,
>>>
>>>> On a positive note, I appear to have an env and cut-down test that
>>>> runs
>>>> in 30min that consistently fails.  I have early positive vibes for
>>>> this
>>>> set of patches:
>>>>
>>>> http://git.catalyst.net.nz/gw?p=samba.git;a=shortlog;h=refs/heads/v
>>>> 4-5-test-drs-fix
>>>
>>> http://git.catalyst.net.nz/gw?p=samba.git;a=commitdiff;h=77b85f3ccd36
>>> ba54c58a3d9347c05a6e768fddd9
>>> Can you catch the attid mismatch for linked attributes in
>>> dsdb_replicated_objects_convert() please?
>>>
>>> I think that's also that place where we should convert the remote
>>> attids to
>>> the local once.
>>
>> I understand the principle.  Currently, the server-supplied data
>> structures are simply attached to the end of the transaction. 
>>
>> If you don't mind, I want to use those patches as-is, because that is
>> what I am gathering some data to prove, that is I don't want to further
>> rework the DRS code until we have this stable.
>>
>> Sadly the data isn't looking good, so we may be back to square one. 
>>
>>>> I'll know more when I see the full set of builds back, and then try
>>>> again just to be sure.  I'll of course file bugs for back-porting
>>>> per
>>>> my last mail, unless you would like to take that honour :-)
>>>
>>> I still don't understand how this could explain the timeouts.
>>
>> These are not timeouts, but un-called callbacks.  The callback to send
>> the IRPC is never fired, as far as I can tell.  That then appears to be
>> a timeout to the caller. 
> 
> I was able to reproduce this...
> 
> On LOCALVAMPIREDC loops like this:
> 
> DsGetNCChanges with uSNChanged >= 3642 flags 0x02000064 on
> <GUID=5b9b30d6-2401-49ba-ad4f-39eb65d637ee>;CN=Schema,CN=Configuration,DC=samba,DC=example,D[34/22555]
> 0 objects (done 0/0) 0 links (done 0/0 (as
> S-1-5-21-839407532-240565878-3380129014-1000))
> ../source4/rpc_server/drsuapi/getncchanges.c:1411: getncchanges on
> DC=samba,DC=example,DC=com using filter (uSNChanged>=3640)
> UpdateRefs on getncchanges for c493362c-d4cc-420a-9338-f383b4f94195
> DsReplicaUpdateRefs for host
> 'c493362c-d4cc-420a-9338-f383b4f94195._msdcs.samba.example.com' with
> GUID c493362c-d4cc-420a-9338-f383b4f94195 options 0x00000007 nc
> =<GUID=360fe8b4-b99f-466a-836c-7d0246f08977>;<SID=S-1-5-21-839407532-240565878-3380129014>;DC=samba,DC=example,DC=com
> DsGetNCChanges with uSNChanged >= 3640 flags 0x02000064 on
> <GUID=360fe8b4-b99f-466a-836c-7d0246f08977>;<SID=S-1-5-21-839407532-240565878-3380129014>;DC=samba,DC=example,DC=com
> gave 0 objects (done 0/0) 0 links (done 0/0 (as
> S-1-5-21-839407532-240565878-3380129014-1000))
> ../source4/rpc_server/drsuapi/getncchanges.c:1411: getncchanges on
> CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com using filter
> (uSNChanged>=3642)
> UpdateRefs on getncchanges for c493362c-d4cc-420a-9338-f383b4f94195
> DsReplicaUpdateRefs for host
> 'c493362c-d4cc-420a-9338-f383b4f94195._msdcs.samba.example.com' with
> GUID c493362c-d4cc-420a-9338-f383b4f94195 options 0x00000007 nc
> =<GUID=5b9b30d6-2401-49ba-ad4f-39eb65d637ee>;CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
> DsGetNCChanges with uSNChanged >= 3642 flags 0x02000064 on
> <GUID=5b9b30d6-2401-49ba-ad4f-39eb65d637ee>;CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
> gave 0 objects (done 0/0) 0 links (done 0/0 (as
> S-1-5-21-839407532-240565878-3380129014-1000))
> ../source4/rpc_server/drsuapi/getncchanges.c:1411: getncchanges on
> DC=samba,DC=example,DC=com using filter (uSNChanged>=3640)
> UpdateRefs on getncchanges for c493362c-d4cc-420a-9338-f383b4f94195
> DsReplicaUpdateRefs for host
> 'c493362c-d4cc-420a-9338-f383b4f94195._msdcs.samba.example.com' with
> GUID c493362c-d4cc-420a-9338-f383b4f94195 options 0x00000007 nc
> =<GUID=360fe8b4-b99f-466a-836c-7d0246f08977>;<SID=S-1-5-21-839407532-240565878-3380129014>;DC=samba,DC=example,DC=com
> DsGetNCChanges with uSNChanged >= 3640 flags 0x02000064 on
> <GUID=360fe8b4-b99f-466a-836c-7d0246f08977>;<SID=S-1-5-21-839407532-240565878-3380129014>;DC=samba,DC=example,DC=com
> gave 0 objects (done 0/0) 0 links (done 0/0 (as
> S-1-5-21-839407532-240565878-3380129014-1000))
> ../source4/rpc_server/drsuapi/getncchanges.c:1411: getncchanges on
> CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com using filter
> (uSNChanged>=3642)
> UpdateRefs on getncchanges for c493362c-d4cc-420a-9338-f383b4f94195
> DsReplicaUpdateRefs for host
> 'c493362c-d4cc-420a-9338-f383b4f94195._msdcs.samba.example.com' with
> GUID c493362c-d4cc-420a-9338-f383b4f94195 options 0x00000007 nc
> =<GUID=5b9b30d6-2401-49ba-ad4f-39eb65d637ee>;CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
> DsGetNCChanges with uSNChanged >= 3642 flags 0x02000064 on
> <GUID=5b9b30d6-2401-49ba-ad4f-39eb65d637ee>;CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
> gave 0 objects (done 0/0) 0 links (done 0/0 (as
> S-1-5-21-839407532-240565878-3380129014-1000))
> ../source4/rpc_server/drsuapi/getncchanges.c:1411: getncchanges on
> DC=samba,DC=example,DC=com using filter (uSNChanged>=3640)
> UpdateRefs on getncchanges for c493362c-d4cc-420a-9338-f383b4f94195
> DsReplicaUpdateRefs for host
> 'c493362c-d4cc-420a-9338-f383b4f94195._msdcs.samba.example.com' with
> GUID c493362c-d4cc-420a-9338-f383b4f94195 options 0x00000007 nc
> =<GUID=360fe8b4-b99f-466a-836c-7d0246f08977>;<SID=S-1-5-21-839407532-240565878-3380129014>;DC=samba,DC=example,DC=com
> DsGetNCChanges with uSNChanged >= 3640 flags 0x02000064 on
> <GUID=360fe8b4-b99f-466a-836c-7d0246f08977>;<SID=S-1-5-21-839407532-240565878-3380129014>;DC=samba,DC=example,DC=com
> gave 0 objects (done 0/0) 0 links (done 0/0 (as
> S-1-5-21-839407532-240565878-3380129014-1000))
> ../source4/rpc_server/drsuapi/getncchanges.c:1411: getncchanges on
> CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com using filter
> (uSNChanged>=3642)
> UpdateRefs on getncchanges for c493362c-d4cc-420a-9338-f383b4f94195
> DsReplicaUpdateRefs for host
> 'c493362c-d4cc-420a-9338-f383b4f94195._msdcs.samba.example.com' with
> GUID c493362c-d4cc-420a-9338-f383b4f94195 options 0x00000007 nc
> =<GUID=5b9b30d6-2401-49ba-ad4f-39eb65d637ee>;CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
> 
> On LOCALDC like this:
> 
> Replicated 0 objects (0 linked attributes) for
> CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
>                                [48/23720]
> Can't replicate DC=samba,DC=example,DC=com because remote schema has
> changed since we last replicated the schema
> Wrong schema when applying reply GetNCChanges, retrying
> linked_attributes_count=0
> DRS replication uptodate modify message:
> dn: CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
> changetype: modify
> replace: replUpToDateVector
> replUpToDateVector::
> AgAAAAAAAAABAAAAAAAAAOvw/JfaT09MuypQYfTe6W49DgAAAAAAAACAP
>  tXesZ0B
> -
> replace: repsFrom
> repsFrom::
> AQAAAAAAAAASAQAAAAAAAK1psw0DAAAArWmzDQMAAAAAAAAA0AAAAEIAAABkAAAAERE
>  RERERERERERERERERERERERERERERERERERERERERERERERERERERERERERERERERERERERERERER
>  ERERERERERERERERERERERERERERERERAAAAADkOAAAAAAAAAAAAAAAAAAA5DgAAAAAAAFH+1HcQq
>  klLow0QmkGRl17r8PyX2k9PTLsqUGH03uluAAAAAAAAAAAAAAAAAAAAAD4AAAA3N2Q0ZmU1MS1hYT
>  EwLTRiNDktYTMwZC0xMDlhNDE5MTk3NWUuX21zZGNzLnNhbWJhLmV4YW1wbGUuY29tAA==
> -
> 
> 
> Replicated 0 objects (0 linked attributes) for
> CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
> Can't replicate DC=samba,DC=example,DC=com because remote schema has
> changed since we last replicated the schema
> Wrong schema when applying reply GetNCChanges, retrying
> linked_attributes_count=0
> DRS replication uptodate modify message:
> dn: CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
> changetype: modify
> replace: replUpToDateVector
> replUpToDateVector::
> AgAAAAAAAAABAAAAAAAAAOvw/JfaT09MuypQYfTe6W49DgAAAAAAAACAP
>  tXesZ0B
> -
> 
> The loop is triggered by the pull side, if it blocks in the debug write
> because I'm looking at the tmux buffer, the other dc becomes idle.
> Once I leave the tmux buffer view it spins again.
> 
> I'm running TDB_NO_FSYNC=1 make -j testenv SELFTEST_TESTENV=vampire_dc
> SCREEN=1 SAMBA_OPTIONS="-d4"
> 
> In the shell I used
> PYTHONPATH=bin/python:source4/torture/drs/python DC1=$DC_SERVER
> DC2=$VAMPIRE_DC_SERVER source4/scripting/bin/subunitrun repl_schema
> -U$DOMAIN/$DC_USERNAME%$DC_PASSWORD
> 
> a few times.
> 
> The I changed to
> PYTHONPATH=bin/python:source4/torture/drs/python DC2=$DC_SERVER
> DC1=$VAMPIRE_DC_SERVER source4/scripting/bin/subunitrun repl_schema
> -U$DOMAIN/$DC_USERNAME%$DC_PASSWORD
> 
> (change DC1 and DC2)
> and got:
> time: 2016-08-04 05:45:58.912320Z
> test: repl_schema.DrsReplSchemaTestCase.test_all
> time: 2016-08-04 05:46:03.079107Z
> failure: repl_schema.DrsReplSchemaTestCase.test_all [
> Traceback (most recent call last):
>   File
> "/home/metze/devel/samba/3.X/master3/source4/torture/drs/python/repl_schema.py",
> line 336, in test_all
>     (c_ldn, c_dn) = self._schema_new_class(self.ldb_dc1, "cls-A", 9)
>   File
> "/home/metze/devel/samba/3.X/master3/source4/torture/drs/python/repl_schema.py",
> line 136, in _schema_new_class
>     self.fail("Adding record failed with %d/%s" % (enum, estr))
> AssertionError: Adding record failed with 53/LDAP error 53
> LDAP_UNWILLING_TO_PERFORM -  <00002035: schema_data_add: we are not
> master: reject request
>> <>
> ]
> time: 2016-08-04 05:46:04.401755Z
> time: 2016-08-04 05:46:04.401909Z
> test: repl_schema.DrsReplSchemaTestCase.test_attribute
> time: 2016-08-04 05:51:06.141438Z
> error: repl_schema.DrsReplSchemaTestCase.test_attribute [
> Traceback (most recent call last):
>   File
> "/home/metze/devel/samba/3.X/master3/source4/torture/drs/python/repl_schema.py",
> line 94, in setUp
>     self._net_drs_replicate(DC=self.dnsname_dc2,
> fromDC=self.dnsname_dc1, forced=True)
>   File
> "/home/metze/devel/samba/3.X/master3/source4/torture/drs/python/drs_base.py",
> line 119, in _net_drs_replicate
>     return self.check_output(cmd_line)
>   File "bin/python/samba/tests/__init__.py", line 804, in check_output
>     raise BlackboxProcessError(retcode, line, p.stdout.read(),
> p.stderr.read())
> BlackboxProcessError: Command
> '/home/metze/devel/samba/3.X/master3/bin/samba-tool drs replicate
> -USAMBADOMAIN/Administrator%locDCpass1 --sync-forced localdc.samb
> a.example.com localvampiredc.samba.example.com
> DC=samba,DC=example,DC=com'; exit status 255; stdout: ''; stderr:
> 'ERROR(<class 'samba.drs_utils.drsException'>): DsReplicaSync failed -
> drsException: DsReplicaSync failed (-1073741643, '{Device Timeout} The
> specified I/O operation on %hs was not completed before the time-ou
> t period expired.')
>   File "bin/python/samba/netcmd/drs.py", line 368, in run
>     drs_utils.sendDsReplicaSync(server_bind, server_bind_handle,
> source_dsa_guid, NC, req_options)
>   File "bin/python/samba/drs_utils.py", line 83, in sendDsReplicaSync
>     raise drsException("DsReplicaSync failed %s" % estr)
> '
> ]
> 
> I hope this helps tracking down the problem.
> The problem seems to be that the schema master wants to replicate
> from another dc, that itself haven't received the schema updates.
> 
> metze
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20160804/64263654/signature.sig>


More information about the samba-technical mailing list