autobuild failure due to python replication tests - why ?

Garming Sam garming at catalyst.net.nz
Tue Aug 2 01:39:50 UTC 2016


Not to disagree, but I think there's a number of points to raise with 
the approach.

Turning off the tests, in this case particularly, probably means turning 
over 50 tests, each of which may or may not actually trigger this 
particular failure. Blanket bans on tests I don't think necessarily does 
anyone any good, especially since tests are not created equally. Some 
are definitely more important than others and some are definitely not as 
easy to individually knownfail (or rather move to flapping). In part, 
the test is to blame, but it's not necessarily easy to write such a 
targeted test.

I think there is an importance in who actually switches off the test in 
the end. And that someone should be doing that manually. I think it's 
safe to say the flapping file, and the tests in it, are quite easily 
forgotten. You could send an email to the maintainer, and they might 
just miss it and not realize their test was moved. And it won't be 
fixed. Even if they're the one that turned it off, people move on to 
other things, they have new priorities. As much as I hate to see 
intermittent failures, they're still conscious reminders to actually fix 
something, or acknowledge that something needs to be fixed.

I think there are some quite serious bugs in the flapping file, and we 
have already found and fixed some of them. And if we're going to put 
more things into it, we need the list of these tests to be on a higher 
profile.


Cheers,

Garming

On 2/08/2016 12:28 p.m., Jeremy Allison wrote:
> Hi Andrew,
>
> I just tried to push a very simple cleanup path
> and the autobuild failed with the following after
> 213 minutes:
>
> 1922(13349)/1951 at 3h1m8s] samba4.drs.repl_move.python(promoted_dc)(promoted_dc)
> UNEXPECTED(error): samba4.drs.repl_move.python(promoted_dc).repl_move.DrsMoveBetweenTreeOfObjectTestCase.test_ReplicateMoveInTree3b(pr
> REASON: Exception: Exception: Traceback (most recent call last):
>    File "/memdisk/jra/a/b940082/samba/source4/torture/drs/python/repl_move.py", line 1838, in setUp
>      self._net_drs_replicate(DC=self.dnsname_dc1, fromDC=self.dnsname_dc2, forced=True)
>    File "/memdisk/jra/a/b940082/samba/source4/torture/drs/python/drs_base.py", line 119, in _net_drs_replicate
>      return self.check_output(cmd_line)
>    File "bin/python/samba/tests/__init__.py", line 804, in check_output
>      raise BlackboxProcessError(retcode, line, p.stdout.read(), p.stderr.read())
> BlackboxProcessError: Command '/memdisk/jra/a/b940082/samba/bin/samba-tool drs replicate -USAMBADOMAIN/Administrator%locDCpass1 --sync
>    File "bin/python/samba/netcmd/drs.py", line 368, in run
>      drs_utils.sendDsReplicaSync(server_bind, server_bind_handle, source_dsa_guid, NC, req_options)
>    File "bin/python/samba/drs_utils.py", line 83, in sendDsReplicaSync
>      raise drsException("DsReplicaSync failed %s" % estr)
> '
>
> Why is this happening ? It's very frustrating to
> try and get simple code changes in and find that
> they're stimied by unrelated issues like this.
>
> I think we need to have some clear guidelines
> regarding the tests:
>
> 1). If a make test passes on a local machine
> but fails when pushed to autobuild due to an unrelated issue
> then the failed test should have (1) black
> mark added to it.
>
> 2). After (2) black marks (or should this be
> 3 or more ?) the test should automatically
> be added to the flaky tests and failures
> ignored.
>
> 3). If more work is done on the tests that
> should fix it, it is re-enabled and the
> process started again.
>
> The unreliability of the python DC tests
> is becoming a serious problem in our development
> schedule, and something *MUST* be done about
> this.
>
> Thoughts anyone ?
>
> Jeremy.
>




More information about the samba-technical mailing list