The responsibility for flapping tests
Kamen Mazdrashki
kamenim at samba.org
Mon Jun 1 06:41:35 MDT 2015
Hi Matt,
On Mon, Jun 1, 2015 at 4:23 AM, Matthieu Patou <mat at matws.net> wrote:
> Is it the root cause of the dbcheck failure ?
>
I don't think so.
By "fails nicely", I meant that now that we don't support tombstone
reanimation
by reverting the module, tombstone related tests are failing as expected.
Cheers,
Kamen
>
> Matthieu.
> On 05/30/2015 04:12 PM, Kamen Mazdrashki wrote:
>
>> Hi Andrew,
>>
>> Could you please take a look at attached patch - it disables
>> tombstone_reanimation module.
>> It is still compiled, but skips registration and I have also removed it
>> from the list of dsdb modules.
>> Running tombstone_reanimation test fails nicely :)
>>
>> Cheers,
>> kamen
>>
>>
>> On Thu, May 28, 2015 at 2:40 PM, Kamen Mazdrashki <kamenim at samba.org>
>> wrote:
>>
>>
>>> On Thu, May 28, 2015 at 6:10 AM, Andrew Bartlett <abartlet at samba.org>
>>> wrote:
>>>
>>> On Thu, 2015-05-28 at 01:21 +0300, Kamen Mazdrashki wrote:
>>>>
>>>>> On Wed, May 27, 2015 at 5:56 AM, Andrew Bartlett <abartlet at samba.org>
>>>>>
>>>> wrote:
>>>>
>>>>> On Wed, 2015-05-27 at 05:43 +0300, Kamen Mazdrashki wrote:
>>>>>>
>>>>>>>
>>>>>>> On Tue, May 26, 2015 at 4:53 PM, Andrew Bartlett <
>>>>>>>
>>>>>> abartlet at samba.org>
>>>>
>>>>> wrote:
>>>>>>> On Tue, 2015-05-26 at 05:46 -0700, Matthieu Patou wrote:
>>>>>>> > On 05/25/2015 09:54 PM, Andrew Bartlett wrote:
>>>>>>> > > On Mon, 2015-05-25 at 21:50 -0700, Matthieu Patou wrote:
>>>>>>> > >> Hello All,
>>>>>>> > >>
>>>>>>> > >> Please find attached 2 patches that reduce the time of
>>>>>>> dbcheck by 10%,
>>>>>>> > >> they avoid calling getwellknowobjects when not needed
>>>>>>> (ie. when dealing
>>>>>>> > >> with the schema NC) and also cache the result of the
>>>>>>> function as well
>>>>>>> > >> known object didn't change.
>>>>>>> > >>
>>>>>>> > >> Pass make test.
>>>>>>> > > As we got into the unfortunate situation where dbcheck
>>>>>>>
>>>>>> was
>>>>
>>>>> marked as
>>>>>>> > > flapping, sadly this means this code is untested in
>>>>>>>
>>>>>> 'make
>>>>
>>>>> test'. :-(
>>>>>>> > >
>>>>>>> > > Andrew Bartlett
>>>>>>> > >
>>>>>>> > Ok but still it doesn't mean that those changes are wrong.
>>>>>>> > One of those is to cache the wellknownobjects when found;
>>>>>>>
>>>>>> as
>>>>
>>>>> the name
>>>>>>> > imply the wellknownobjects doesn't change. Also I think I
>>>>>>> did several
>>>>>>> > rounds of make tests without getting an error so if let's
>>>>>>> say 3 rounds
>>>>>>> > of make test didn't show any regression we can treat this
>>>>>>>
>>>>>> as
>>>>
>>>>> a reliable
>>>>>>> > patch no ?
>>>>>>>
>>>>>>> No, because tests in the flapping file are essentially
>>>>>>>
>>>>>> skipped
>>>>
>>>>> - we have
>>>>>>> no idea if they succeeded or failed, or if they continue to.
>>>>>>>
>>>>>>> Before we get this in, we have to revert that unfortunate
>>>>>>> addition, and
>>>>>>> to do that we probably need to revert whatever made it flap
>>>>>>> (presumably
>>>>>>> the tombstone reanimation work).
>>>>>>>
>>>>>>> I realise this is totally unfair. Untested code is really
>>>>>>> bad, but
>>>>>>> changing untested code is even worse.
>>>>>>>
>>>>>>> The only way forward for this in the long term would be to
>>>>>>> declare that
>>>>>>> we will remove, after a timeout of two months, each entry
>>>>>>>
>>>>>> from
>>>>
>>>>> flapping,
>>>>>>> and that as part of this we will review those items to
>>>>>>>
>>>>>> decide
>>>>
>>>>> that they
>>>>>>> are either pointless (and so remove the pretence that they
>>>>>>> have value),
>>>>>>> or fix them not to flap. That, and better pre-commit tests
>>>>>>> that make
>>>>>>> flapping tests harder to get in, could be a better
>>>>>>>
>>>>>> compromise
>>>>
>>>>> to release
>>>>>>> the pressure of failed autobuilds in the short-term without
>>>>>>> abandoning
>>>>>>> tests for the long term.
>>>>>>>
>>>>>>> That is, the current process where tests get added to
>>>>>>>
>>>>>> flapping
>>>>
>>>>> 'one
>>>>>>> way', just leaves the costs of the underlying failure on our
>>>>>>> users and
>>>>>>> never on us as developers.
>>>>>>>
>>>>>>> Like portability, it will only change if work is invested.
>>>>>>>
>>>>>>> When I'm over jet-lag, I will propose to remove tombstone
>>>>>>> reanimation as
>>>>>>> a supported feature (by removing the module from the default
>>>>>>> stack), and
>>>>>>> knownfail or skip the associated tests.
>>>>>>>
>>>>>>> I am afraid that just removing the module won't remove all changes
>>>>>>> introduced
>>>>>>> for tombstone reanimation :). Funny thing is, as Matthieu mentioned
>>>>>>>
>>>>>> he
>>>>
>>>>> did for his
>>>>>>> optimization work, that I have run the test suite for tombstone
>>>>>>> reanimation for so
>>>>>>> many times without a failure. Which leads me to believe that there
>>>>>>>
>>>>>> is
>>>>
>>>>> a subtle
>>>>>>> interference between tests/environments. afaik, we don't run every
>>>>>>> test in
>>>>>>> a cleanly provisioned environment, do we?
>>>>>>>
>>>>>> Correct, the tests run in a common environment, and the interaction
>>>>>> between the various DCs that are running at the time can impact on the
>>>>>> results. I would suggest that for tests like this one, that we should
>>>>>> actually force that, by having it run against promoted_dc or
>>>>>>
>>>>> vampire_dc,
>>>>
>>>>> as well as a standalone server such as fl2008dc so that the
>>>>>>
>>>>> replication
>>>>
>>>>> behaviours and standalone behaviours are captured.
>>>>>>
>>>>>
>>>>> This would be great - after all, tombstone reanimation has nothing to
>>>>> do
>>>>> with replication test and we have different set of tests to show
>>>>> deleted
>>>>> objects are replicated correctly.
>>>>> How can I achieve that? Should I move Tombstone reanimation test
>>>>> earlier in tests.py?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> But regarding this test and the functionality it was intended to
>>>>>>
>>>>> prove,
>>>>
>>>>> what would you suggest? The cleanest solution would be to revert the
>>>>>> lot until we understand it, but that also seems rather drastic.
>>>>>>
>>>>>> It seems reasonable to at least remove the module and test, even if
>>>>>>
>>>>> that
>>>>
>>>>> won't be a full revert, and then put it back when the full make test
>>>>>>
>>>>> is
>>>>
>>>>> shown not to flap.
>>>>>>
>>>>>> You are totally right. I thought you want to revert the whole patch
>>>>> set,
>>>>> sorry!
>>>>> Disabling just this module is *very* reasonable approach indeed.
>>>>>
>>>> I've found myself a bit busy, can you knock up the patch?
>>>>
>>>> I will out of town, getting back tomorrow and i will post a patch.
>>> Hopefully I can play
>>> with reordering the tests to be executed in a 'cleaner' environment (I
>>> count on your
>>> help heavily there thought :)
>>>
>>> Cheers,
>>> Kamen
>>>
>>>
>>> Thanks,
>>>>
>>>> --
>>>> Andrew Bartlett
>>>> http://samba.org/~abartlet/
>>>> Authentication Developer, Samba Team http://samba.org
>>>> Samba Developer, Catalyst IT
>>>> http://catalyst.net.nz/services/samba
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>
More information about the samba-technical
mailing list