The responsibility for flapping tests

Sun May 31 19:23:30 MDT 2015

Is it the root cause of the dbcheck failure ?

Matthieu.
On 05/30/2015 04:12 PM, Kamen Mazdrashki wrote:
> Hi Andrew,
>
> Could you please take a look at attached patch - it disables
> tombstone_reanimation module.
> It is still compiled, but skips registration and I have also removed it
> from the list of dsdb modules.
> Running tombstone_reanimation test fails nicely :)
>
> Cheers,
> kamen
>
> On Thu, May 28, 2015 at 2:40 PM, Kamen Mazdrashki <kamenim at samba.org> wrote:
>
>>
>> On Thu, May 28, 2015 at 6:10 AM, Andrew Bartlett <abartlet at samba.org>
>> wrote:
>>
>>> On Thu, 2015-05-28 at 01:21 +0300, Kamen Mazdrashki wrote:
>>>> On Wed, May 27, 2015 at 5:56 AM, Andrew Bartlett <abartlet at samba.org>
>>> wrote:
>>>>> On Wed, 2015-05-27 at 05:43 +0300, Kamen Mazdrashki wrote:
>>>>>>
>>>>>> On Tue, May 26, 2015 at 4:53 PM, Andrew Bartlett <
>>> abartlet at samba.org>
>>>>>> wrote:
>>>>>>          On Tue, 2015-05-26 at 05:46 -0700, Matthieu Patou wrote:
>>>>>>          > On 05/25/2015 09:54 PM, Andrew Bartlett wrote:
>>>>>>          > > On Mon, 2015-05-25 at 21:50 -0700, Matthieu Patou wrote:
>>>>>>          > >> Hello All,
>>>>>>          > >>
>>>>>>          > >> Please find attached 2 patches that reduce the time of
>>>>>>          dbcheck by 10%,
>>>>>>          > >> they avoid calling getwellknowobjects when not needed
>>>>>>          (ie. when dealing
>>>>>>          > >> with the schema NC) and also cache the result of the
>>>>>>          function as well
>>>>>>          > >> known object didn't change.
>>>>>>          > >>
>>>>>>          > >> Pass make test.
>>>>>>          > > As we got into the unfortunate situation where dbcheck
>>> was
>>>>>>          marked as
>>>>>>          > > flapping, sadly this means this code is untested in
>>> 'make
>>>>>>          test'.  :-(
>>>>>>          > >
>>>>>>          > > Andrew Bartlett
>>>>>>          > >
>>>>>>          > Ok but still it doesn't mean that those changes are wrong.
>>>>>>          > One of those is to cache the wellknownobjects when found;
>>> as
>>>>>>          the name
>>>>>>          > imply the wellknownobjects doesn't change. Also I think I
>>>>>>          did several
>>>>>>          > rounds of make tests without getting an error so if let's
>>>>>>          say 3 rounds
>>>>>>          > of make test didn't show any regression we can treat this
>>> as
>>>>>>          a reliable
>>>>>>          > patch no ?
>>>>>>
>>>>>>          No, because tests in the flapping file are essentially
>>> skipped
>>>>>>          - we have
>>>>>>          no idea if they succeeded or failed, or if they continue to.
>>>>>>
>>>>>>          Before we get this in, we have to revert that unfortunate
>>>>>>          addition, and
>>>>>>          to do that we probably need to revert whatever made it flap
>>>>>>          (presumably
>>>>>>          the tombstone reanimation work).
>>>>>>
>>>>>>          I realise this is totally unfair.  Untested code is really
>>>>>>          bad, but
>>>>>>          changing untested code is even worse.
>>>>>>
>>>>>>          The only way forward for this in the long term would be to
>>>>>>          declare that
>>>>>>          we will remove, after a timeout of two months, each entry
>>> from
>>>>>>          flapping,
>>>>>>          and that as part of this we will review those items to
>>> decide
>>>>>>          that they
>>>>>>          are either pointless (and so remove the pretence that they
>>>>>>          have value),
>>>>>>          or fix them not to flap.  That, and better pre-commit tests
>>>>>>          that make
>>>>>>          flapping tests harder to get in, could be a better
>>> compromise
>>>>>>          to release
>>>>>>          the pressure of failed autobuilds in the short-term without
>>>>>>          abandoning
>>>>>>          tests for the long term.
>>>>>>
>>>>>>          That is, the current process where tests get added to
>>> flapping
>>>>>>          'one
>>>>>>          way', just leaves the costs of the underlying failure on our
>>>>>>          users and
>>>>>>          never on us as developers.
>>>>>>
>>>>>>          Like portability, it will only change if work is invested.
>>>>>>
>>>>>>          When I'm over jet-lag, I will propose to remove tombstone
>>>>>>          reanimation as
>>>>>>          a supported feature (by removing the module from the default
>>>>>>          stack), and
>>>>>>          knownfail or skip the associated tests.
>>>>>>
>>>>>> I am afraid that just removing the module won't remove all changes
>>>>>> introduced
>>>>>> for tombstone reanimation :). Funny thing is, as Matthieu mentioned
>>> he
>>>>>> did for his
>>>>>> optimization work, that I have run the test suite for tombstone
>>>>>> reanimation for so
>>>>>> many times without a failure. Which leads me to believe that there
>>> is
>>>>>> a subtle
>>>>>> interference between tests/environments. afaik, we don't run every
>>>>>> test in
>>>>>> a cleanly provisioned environment, do we?
>>>>> Correct, the tests run in a common environment, and the interaction
>>>>> between the various DCs that are running at the time can impact on the
>>>>> results.  I would suggest that for tests like this one, that we should
>>>>> actually force that, by having it run against promoted_dc or
>>> vampire_dc,
>>>>> as well as a standalone server such as fl2008dc so that the
>>> replication
>>>>> behaviours and standalone behaviours are captured.
>>>>
>>>> This would be great - after all, tombstone reanimation has nothing to do
>>>> with replication test and we have different set of tests to show deleted
>>>> objects are replicated correctly.
>>>> How can I achieve that? Should I move Tombstone reanimation test
>>>> earlier in tests.py?
>>>>
>>>>
>>>>
>>>>
>>>>> But regarding this test and the functionality it was intended to
>>> prove,
>>>>> what would you suggest?  The cleanest solution would be to revert the
>>>>> lot until we understand it, but that also seems rather drastic.
>>>>>
>>>>> It seems reasonable to at least remove the module and test, even if
>>> that
>>>>> won't be a full revert, and then put it back when the full make test
>>> is
>>>>> shown not to flap.
>>>>>
>>>> You are totally right. I thought you want to revert the whole patch set,
>>>> sorry!
>>>> Disabling just this module is *very* reasonable approach indeed.
>>> I've found myself a bit busy, can you knock up the patch?
>>>
>> I will out of town, getting back tomorrow and i will post a patch.
>> Hopefully I can play
>> with reordering the tests to be executed in a 'cleaner' environment (I
>> count on your
>> help heavily there thought :)
>>
>> Cheers,
>> Kamen
>>
>>
>>> Thanks,
>>>
>>> --
>>> Andrew Bartlett
>>> http://samba.org/~abartlet/
>>> Authentication Developer, Samba Team  http://samba.org
>>> Samba Developer, Catalyst IT
>>> http://catalyst.net.nz/services/samba
>>>
>>>
>>>
>>>
>>>