autobuild failure due to python replication tests - why ?

Andrew Bartlett abartlet at samba.org
Tue Aug 2 01:49:34 UTC 2016


On Mon, 2016-08-01 at 17:28 -0700, Jeremy Allison wrote:
> Hi Andrew,
> 
> I just tried to push a very simple cleanup path
> and the autobuild failed with the following after
> 213 minutes:
> 
> 1922(13349)/1951 at 3h1m8s]
> samba4.drs.repl_move.python(promoted_dc)(promoted_dc)
> UNEXPECTED(error):
> samba4.drs.repl_move.python(promoted_dc).repl_move.DrsMoveBetweenTree
> OfObjectTestCase.test_ReplicateMoveInTree3b(pr
> REASON: Exception: Exception: Traceback (most recent call last):
>   File
> "/memdisk/jra/a/b940082/samba/source4/torture/drs/python/repl_move.py
> ", line 1838, in setUp
>     self._net_drs_replicate(DC=self.dnsname_dc1,
> fromDC=self.dnsname_dc2, forced=True)
>   File
> "/memdisk/jra/a/b940082/samba/source4/torture/drs/python/drs_base.py"
> , line 119, in _net_drs_replicate
>     return self.check_output(cmd_line)
>   File "bin/python/samba/tests/__init__.py", line 804, in
> check_output
>     raise BlackboxProcessError(retcode, line, p.stdout.read(),
> p.stderr.read())
> BlackboxProcessError: Command
> '/memdisk/jra/a/b940082/samba/bin/samba-tool drs replicate
> -USAMBADOMAIN/Administrator%locDCpass1 --sync
>   File "bin/python/samba/netcmd/drs.py", line 368, in run
>     drs_utils.sendDsReplicaSync(server_bind, server_bind_handle,
> source_dsa_guid, NC, req_options)
>   File "bin/python/samba/drs_utils.py", line 83, in sendDsReplicaSync
>     raise drsException("DsReplicaSync failed %s" % estr)
> '
> 
> Why is this happening ? It's very frustrating to
> try and get simple code changes in and find that
> they're stimied by unrelated issues like this.

To do understand your frustration!  We have tried a number of
approaches and invested ourselves in a number of different proposed
resolutions that turn our not to help.

The issue is that the underlying issue appears to be historical, not a
regression, and these tests fail in a load-dependent manner. 

They almost never failed when tested in isolated VMs with 4 CPUs
allocated.  When I was finishing the last big DRS replication test, I
had 10 tests passing or failing on unrelated issues in the Catalyst
Cloud.

Additionally, at the time that they were added, they were reliable on
sn-devel (otherwise I wouldn't have added them), at least as much as
the rest of the test was reliable at that time.  (I may have seen one
failure).

Finally, we have addressed some of the other flapping tests in the file
server and elsewhere in the AD DC, so now we progress to the
replication tests.  Attached is a fix for another such flapping test. 

This brings us to the current situation: 

These replication tests and a number of CTDB tests reliably fail under
load.  As we rushed to get 4.5 branched for example, that seems to have
become a particular issue. 

I've now been able to reproduce these failures (keeping Amitay and
Martin busy on the CTDB end) reliably using a 2-CPU VM.  I hope for
some useful information soon.

As to your proposal, I think Garming expressed it best.

To mark the DRS tests as flapping would require removing all testing
from all our replication and schema code.  Because the issue isn't with
any one test, but with the underlying server, no single test is to
triggering this, but the increase in tests means it is more likely to
fail in a full run. 

We did have a number of these tests disabled as flapping in the past.
 The net result of this was that simple typos and missing parameters in
'samba-tool domain demote' were not detected until our users attempted
to use the tools. 

We may still need to, but I hope for some more positive news soon.

Thanks,

Andrew Bartlett

-- 
Andrew Bartlett
https://samba.org/~abartlet/
Authentication Developer, Samba Team         https://samba.org
Samba Development and Support, Catalyst IT   
https://catalyst.net.nz/services/samba



-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-torture-backupkey-Allow-WERR_INVALID_ACCESS-WERR_INV.patch
Type: text/x-patch
Size: 1442 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20160802/3b5e0411/0001-torture-backupkey-Allow-WERR_INVALID_ACCESS-WERR_INV.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-selftest-Merge-alternate-error-codes-into-backupkey-.patch
Type: text/x-patch
Size: 1364 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20160802/3b5e0411/0002-selftest-Merge-alternate-error-codes-into-backupkey-.bin>


More information about the samba-technical mailing list