The responsibility for flapping tests

Thu May 28 10:49:15 MDT 2015

On 05/28/2015 04:40 AM, Kamen Mazdrashki wrote:
> On Thu, May 28, 2015 at 6:10 AM, Andrew Bartlett <abartlet at samba.org> wrote:
>
>> On Thu, 2015-05-28 at 01:21 +0300, Kamen Mazdrashki wrote:
>>> On Wed, May 27, 2015 at 5:56 AM, Andrew Bartlett <abartlet at samba.org>
>> wrote:
>>>> On Wed, 2015-05-27 at 05:43 +0300, Kamen Mazdrashki wrote:
>>>>>
>>>>> On Tue, May 26, 2015 at 4:53 PM, Andrew Bartlett <abartlet at samba.org
>>>>> wrote:
>>>>>          On Tue, 2015-05-26 at 05:46 -0700, Matthieu Patou wrote:
>>>>>          > On 05/25/2015 09:54 PM, Andrew Bartlett wrote:
>>>>>          > > On Mon, 2015-05-25 at 21:50 -0700, Matthieu Patou wrote:
>>>>>          > >> Hello All,
>>>>>          > >>
>>>>>          > >> Please find attached 2 patches that reduce the time of
>>>>>          dbcheck by 10%,
>>>>>          > >> they avoid calling getwellknowobjects when not needed
>>>>>          (ie. when dealing
>>>>>          > >> with the schema NC) and also cache the result of the
>>>>>          function as well
>>>>>          > >> known object didn't change.
>>>>>          > >>
>>>>>          > >> Pass make test.
>>>>>          > > As we got into the unfortunate situation where dbcheck
>> was
>>>>>          marked as
>>>>>          > > flapping, sadly this means this code is untested in 'make
>>>>>          test'.  :-(
>>>>>          > >
>>>>>          > > Andrew Bartlett
>>>>>          > >
>>>>>          > Ok but still it doesn't mean that those changes are wrong.
>>>>>          > One of those is to cache the wellknownobjects when found;
>> as
>>>>>          the name
>>>>>          > imply the wellknownobjects doesn't change. Also I think I
>>>>>          did several
>>>>>          > rounds of make tests without getting an error so if let's
>>>>>          say 3 rounds
>>>>>          > of make test didn't show any regression we can treat this
>> as
>>>>>          a reliable
>>>>>          > patch no ?
>>>>>
>>>>>          No, because tests in the flapping file are essentially
>> skipped
>>>>>          - we have
>>>>>          no idea if they succeeded or failed, or if they continue to.
>>>>>
>>>>>          Before we get this in, we have to revert that unfortunate
>>>>>          addition, and
>>>>>          to do that we probably need to revert whatever made it flap
>>>>>          (presumably
>>>>>          the tombstone reanimation work).
>>>>>
>>>>>          I realise this is totally unfair.  Untested code is really
>>>>>          bad, but
>>>>>          changing untested code is even worse.
>>>>>
>>>>>          The only way forward for this in the long term would be to
>>>>>          declare that
>>>>>          we will remove, after a timeout of two months, each entry
>> from
>>>>>          flapping,
>>>>>          and that as part of this we will review those items to decide
>>>>>          that they
>>>>>          are either pointless (and so remove the pretence that they
>>>>>          have value),
>>>>>          or fix them not to flap.  That, and better pre-commit tests
>>>>>          that make
>>>>>          flapping tests harder to get in, could be a better compromise
>>>>>          to release
>>>>>          the pressure of failed autobuilds in the short-term without
>>>>>          abandoning
>>>>>          tests for the long term.
>>>>>
>>>>>          That is, the current process where tests get added to
>> flapping
>>>>>          'one
>>>>>          way', just leaves the costs of the underlying failure on our
>>>>>          users and
>>>>>          never on us as developers.
>>>>>
>>>>>          Like portability, it will only change if work is invested.
>>>>>
>>>>>          When I'm over jet-lag, I will propose to remove tombstone
>>>>>          reanimation as
>>>>>          a supported feature (by removing the module from the default
>>>>>          stack), and
>>>>>          knownfail or skip the associated tests.
>>>>>
>>>>> I am afraid that just removing the module won't remove all changes
>>>>> introduced
>>>>> for tombstone reanimation :). Funny thing is, as Matthieu mentioned
>> he
>>>>> did for his
>>>>> optimization work, that I have run the test suite for tombstone
>>>>> reanimation for so
>>>>> many times without a failure. Which leads me to believe that there is
>>>>> a subtle
>>>>> interference between tests/environments. afaik, we don't run every
>>>>> test in
>>>>> a cleanly provisioned environment, do we?
>>>> Correct, the tests run in a common environment, and the interaction
>>>> between the various DCs that are running at the time can impact on the
>>>> results.  I would suggest that for tests like this one, that we should
>>>> actually force that, by having it run against promoted_dc or
>> vampire_dc,
>>>> as well as a standalone server such as fl2008dc so that the replication
>>>> behaviours and standalone behaviours are captured.
>>>
>>> This would be great - after all, tombstone reanimation has nothing to do
>>> with replication test and we have different set of tests to show deleted
>>> objects are replicated correctly.
>>> How can I achieve that? Should I move Tombstone reanimation test
>>> earlier in tests.py?
>>>
>>>
>>>
>>>
>>>> But regarding this test and the functionality it was intended to prove,
>>>> what would you suggest?  The cleanest solution would be to revert the
>>>> lot until we understand it, but that also seems rather drastic.
>>>>
>>>> It seems reasonable to at least remove the module and test, even if
>> that
>>>> won't be a full revert, and then put it back when the full make test is
>>>> shown not to flap.
>>>>
>>> You are totally right. I thought you want to revert the whole patch set,
>>> sorry!
>>> Disabling just this module is *very* reasonable approach indeed.
>> I've found myself a bit busy, can you knock up the patch?
>>
> I will out of town, getting back tomorrow and i will post a patch.
> Hopefully I can play
> with reordering the tests to be executed in a 'cleaner' environment (I
> count on your
> help heavily there thought :)
>
BTW one step of the plan to speed up the speed of make test is to start 
using random order testing to find implicit dependencies between tests.

Matthieu


-- 
Matthieu Patou
Samba Team
http://samba.org