The responsibility for flapping tests

Mon Jun 1 06:41:35 MDT 2015

Hi Matt,

On Mon, Jun 1, 2015 at 4:23 AM, Matthieu Patou <mat at matws.net> wrote:

> Is it the root cause of the dbcheck failure ?
>
I don't think so.
By "fails nicely", I meant that now that we don't support tombstone
reanimation
by reverting the module, tombstone related tests are failing as expected.

Cheers,
Kamen

>
> Matthieu.
> On 05/30/2015 04:12 PM, Kamen Mazdrashki wrote:
>
>> Hi Andrew,
>>
>> Could you please take a look at attached patch - it disables
>> tombstone_reanimation module.
>> It is still compiled, but skips registration and I have also removed it
>> from the list of dsdb modules.
>> Running tombstone_reanimation test fails nicely :)
>>
>> Cheers,
>> kamen
>>
>>
>> On Thu, May 28, 2015 at 2:40 PM, Kamen Mazdrashki <kamenim at samba.org>
>> wrote:
>>
>>
>>> On Thu, May 28, 2015 at 6:10 AM, Andrew Bartlett <abartlet at samba.org>
>>> wrote:
>>>
>>>  On Thu, 2015-05-28 at 01:21 +0300, Kamen Mazdrashki wrote:
>>>>
>>>>> On Wed, May 27, 2015 at 5:56 AM, Andrew Bartlett <abartlet at samba.org>
>>>>>
>>>> wrote:
>>>>
>>>>> On Wed, 2015-05-27 at 05:43 +0300, Kamen Mazdrashki wrote:
>>>>>>
>>>>>>>
>>>>>>> On Tue, May 26, 2015 at 4:53 PM, Andrew Bartlett <
>>>>>>>
>>>>>> abartlet at samba.org>
>>>>
>>>>> wrote:
>>>>>>>          On Tue, 2015-05-26 at 05:46 -0700, Matthieu Patou wrote:
>>>>>>>          > On 05/25/2015 09:54 PM, Andrew Bartlett wrote:
>>>>>>>          > > On Mon, 2015-05-25 at 21:50 -0700, Matthieu Patou wrote:
>>>>>>>          > >> Hello All,
>>>>>>>          > >>
>>>>>>>          > >> Please find attached 2 patches that reduce the time of
>>>>>>>          dbcheck by 10%,
>>>>>>>          > >> they avoid calling getwellknowobjects when not needed
>>>>>>>          (ie. when dealing
>>>>>>>          > >> with the schema NC) and also cache the result of the
>>>>>>>          function as well
>>>>>>>          > >> known object didn't change.
>>>>>>>          > >>
>>>>>>>          > >> Pass make test.
>>>>>>>          > > As we got into the unfortunate situation where dbcheck
>>>>>>>
>>>>>> was
>>>>
>>>>>          marked as
>>>>>>>          > > flapping, sadly this means this code is untested in
>>>>>>>
>>>>>> 'make
>>>>
>>>>>          test'.  :-(
>>>>>>>          > >
>>>>>>>          > > Andrew Bartlett
>>>>>>>          > >
>>>>>>>          > Ok but still it doesn't mean that those changes are wrong.
>>>>>>>          > One of those is to cache the wellknownobjects when found;
>>>>>>>
>>>>>> as
>>>>
>>>>>          the name
>>>>>>>          > imply the wellknownobjects doesn't change. Also I think I
>>>>>>>          did several
>>>>>>>          > rounds of make tests without getting an error so if let's
>>>>>>>          say 3 rounds
>>>>>>>          > of make test didn't show any regression we can treat this
>>>>>>>
>>>>>> as
>>>>
>>>>>          a reliable
>>>>>>>          > patch no ?
>>>>>>>
>>>>>>>          No, because tests in the flapping file are essentially
>>>>>>>
>>>>>> skipped
>>>>
>>>>>          - we have
>>>>>>>          no idea if they succeeded or failed, or if they continue to.
>>>>>>>
>>>>>>>          Before we get this in, we have to revert that unfortunate
>>>>>>>          addition, and
>>>>>>>          to do that we probably need to revert whatever made it flap
>>>>>>>          (presumably
>>>>>>>          the tombstone reanimation work).
>>>>>>>
>>>>>>>          I realise this is totally unfair.  Untested code is really
>>>>>>>          bad, but
>>>>>>>          changing untested code is even worse.
>>>>>>>
>>>>>>>          The only way forward for this in the long term would be to
>>>>>>>          declare that
>>>>>>>          we will remove, after a timeout of two months, each entry
>>>>>>>
>>>>>> from
>>>>
>>>>>          flapping,
>>>>>>>          and that as part of this we will review those items to
>>>>>>>
>>>>>> decide
>>>>
>>>>>          that they
>>>>>>>          are either pointless (and so remove the pretence that they
>>>>>>>          have value),
>>>>>>>          or fix them not to flap.  That, and better pre-commit tests
>>>>>>>          that make
>>>>>>>          flapping tests harder to get in, could be a better
>>>>>>>
>>>>>> compromise
>>>>
>>>>>          to release
>>>>>>>          the pressure of failed autobuilds in the short-term without
>>>>>>>          abandoning
>>>>>>>          tests for the long term.
>>>>>>>
>>>>>>>          That is, the current process where tests get added to
>>>>>>>
>>>>>> flapping
>>>>
>>>>>          'one
>>>>>>>          way', just leaves the costs of the underlying failure on our
>>>>>>>          users and
>>>>>>>          never on us as developers.
>>>>>>>
>>>>>>>          Like portability, it will only change if work is invested.
>>>>>>>
>>>>>>>          When I'm over jet-lag, I will propose to remove tombstone
>>>>>>>          reanimation as
>>>>>>>          a supported feature (by removing the module from the default
>>>>>>>          stack), and
>>>>>>>          knownfail or skip the associated tests.
>>>>>>>
>>>>>>> I am afraid that just removing the module won't remove all changes
>>>>>>> introduced
>>>>>>> for tombstone reanimation :). Funny thing is, as Matthieu mentioned
>>>>>>>
>>>>>> he
>>>>
>>>>> did for his
>>>>>>> optimization work, that I have run the test suite for tombstone
>>>>>>> reanimation for so
>>>>>>> many times without a failure. Which leads me to believe that there
>>>>>>>
>>>>>> is
>>>>
>>>>> a subtle
>>>>>>> interference between tests/environments. afaik, we don't run every
>>>>>>> test in
>>>>>>> a cleanly provisioned environment, do we?
>>>>>>>
>>>>>> Correct, the tests run in a common environment, and the interaction
>>>>>> between the various DCs that are running at the time can impact on the
>>>>>> results.  I would suggest that for tests like this one, that we should
>>>>>> actually force that, by having it run against promoted_dc or
>>>>>>
>>>>> vampire_dc,
>>>>
>>>>> as well as a standalone server such as fl2008dc so that the
>>>>>>
>>>>> replication
>>>>
>>>>> behaviours and standalone behaviours are captured.
>>>>>>
>>>>>
>>>>> This would be great - after all, tombstone reanimation has nothing to
>>>>> do
>>>>> with replication test and we have different set of tests to show
>>>>> deleted
>>>>> objects are replicated correctly.
>>>>> How can I achieve that? Should I move Tombstone reanimation test
>>>>> earlier in tests.py?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  But regarding this test and the functionality it was intended to
>>>>>>
>>>>> prove,
>>>>
>>>>> what would you suggest?  The cleanest solution would be to revert the
>>>>>> lot until we understand it, but that also seems rather drastic.
>>>>>>
>>>>>> It seems reasonable to at least remove the module and test, even if
>>>>>>
>>>>> that
>>>>
>>>>> won't be a full revert, and then put it back when the full make test
>>>>>>
>>>>> is
>>>>
>>>>> shown not to flap.
>>>>>>
>>>>>>  You are totally right. I thought you want to revert the whole patch
>>>>> set,
>>>>> sorry!
>>>>> Disabling just this module is *very* reasonable approach indeed.
>>>>>
>>>> I've found myself a bit busy, can you knock up the patch?
>>>>
>>>>  I will out of town, getting back tomorrow and i will post a patch.
>>> Hopefully I can play
>>> with reordering the tests to be executed in a 'cleaner' environment (I
>>> count on your
>>> help heavily there thought :)
>>>
>>> Cheers,
>>> Kamen
>>>
>>>
>>>  Thanks,
>>>>
>>>> --
>>>> Andrew Bartlett
>>>> http://samba.org/~abartlet/
>>>> Authentication Developer, Samba Team  http://samba.org
>>>> Samba Developer, Catalyst IT
>>>> http://catalyst.net.nz/services/samba
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>