The responsibility for flapping tests (was: Re: [PATCHES] avoid searching for wellknownobjects when not needed, cache results for speed)

Kamen Mazdrashki kamenim at samba.org
Sat May 30 17:12:48 MDT 2015


Hi Andrew,

Could you please take a look at attached patch - it disables
tombstone_reanimation module.
It is still compiled, but skips registration and I have also removed it
from the list of dsdb modules.
Running tombstone_reanimation test fails nicely :)

Cheers,
kamen

On Thu, May 28, 2015 at 2:40 PM, Kamen Mazdrashki <kamenim at samba.org> wrote:

>
>
> On Thu, May 28, 2015 at 6:10 AM, Andrew Bartlett <abartlet at samba.org>
> wrote:
>
>> On Thu, 2015-05-28 at 01:21 +0300, Kamen Mazdrashki wrote:
>> > On Wed, May 27, 2015 at 5:56 AM, Andrew Bartlett <abartlet at samba.org>
>> wrote:
>> >
>> > > On Wed, 2015-05-27 at 05:43 +0300, Kamen Mazdrashki wrote:
>> > > >
>> > > >
>> > > > On Tue, May 26, 2015 at 4:53 PM, Andrew Bartlett <
>> abartlet at samba.org>
>> > > > wrote:
>> > > >         On Tue, 2015-05-26 at 05:46 -0700, Matthieu Patou wrote:
>> > > >         > On 05/25/2015 09:54 PM, Andrew Bartlett wrote:
>> > > >         > > On Mon, 2015-05-25 at 21:50 -0700, Matthieu Patou wrote:
>> > > >         > >> Hello All,
>> > > >         > >>
>> > > >         > >> Please find attached 2 patches that reduce the time of
>> > > >         dbcheck by 10%,
>> > > >         > >> they avoid calling getwellknowobjects when not needed
>> > > >         (ie. when dealing
>> > > >         > >> with the schema NC) and also cache the result of the
>> > > >         function as well
>> > > >         > >> known object didn't change.
>> > > >         > >>
>> > > >         > >> Pass make test.
>> > > >         > > As we got into the unfortunate situation where dbcheck
>> was
>> > > >         marked as
>> > > >         > > flapping, sadly this means this code is untested in
>> 'make
>> > > >         test'.  :-(
>> > > >         > >
>> > > >         > > Andrew Bartlett
>> > > >         > >
>> > > >         > Ok but still it doesn't mean that those changes are wrong.
>> > > >         > One of those is to cache the wellknownobjects when found;
>> as
>> > > >         the name
>> > > >         > imply the wellknownobjects doesn't change. Also I think I
>> > > >         did several
>> > > >         > rounds of make tests without getting an error so if let's
>> > > >         say 3 rounds
>> > > >         > of make test didn't show any regression we can treat this
>> as
>> > > >         a reliable
>> > > >         > patch no ?
>> > > >
>> > > >         No, because tests in the flapping file are essentially
>> skipped
>> > > >         - we have
>> > > >         no idea if they succeeded or failed, or if they continue to.
>> > > >
>> > > >         Before we get this in, we have to revert that unfortunate
>> > > >         addition, and
>> > > >         to do that we probably need to revert whatever made it flap
>> > > >         (presumably
>> > > >         the tombstone reanimation work).
>> > > >
>> > > >         I realise this is totally unfair.  Untested code is really
>> > > >         bad, but
>> > > >         changing untested code is even worse.
>> > > >
>> > > >         The only way forward for this in the long term would be to
>> > > >         declare that
>> > > >         we will remove, after a timeout of two months, each entry
>> from
>> > > >         flapping,
>> > > >         and that as part of this we will review those items to
>> decide
>> > > >         that they
>> > > >         are either pointless (and so remove the pretence that they
>> > > >         have value),
>> > > >         or fix them not to flap.  That, and better pre-commit tests
>> > > >         that make
>> > > >         flapping tests harder to get in, could be a better
>> compromise
>> > > >         to release
>> > > >         the pressure of failed autobuilds in the short-term without
>> > > >         abandoning
>> > > >         tests for the long term.
>> > > >
>> > > >         That is, the current process where tests get added to
>> flapping
>> > > >         'one
>> > > >         way', just leaves the costs of the underlying failure on our
>> > > >         users and
>> > > >         never on us as developers.
>> > > >
>> > > >         Like portability, it will only change if work is invested.
>> > > >
>> > > >         When I'm over jet-lag, I will propose to remove tombstone
>> > > >         reanimation as
>> > > >         a supported feature (by removing the module from the default
>> > > >         stack), and
>> > > >         knownfail or skip the associated tests.
>> > > >
>> > > > I am afraid that just removing the module won't remove all changes
>> > > > introduced
>> > > > for tombstone reanimation :). Funny thing is, as Matthieu mentioned
>> he
>> > > > did for his
>> > > > optimization work, that I have run the test suite for tombstone
>> > > > reanimation for so
>> > > > many times without a failure. Which leads me to believe that there
>> is
>> > > > a subtle
>> > > > interference between tests/environments. afaik, we don't run every
>> > > > test in
>> > > > a cleanly provisioned environment, do we?
>> > >
>> > > Correct, the tests run in a common environment, and the interaction
>> > > between the various DCs that are running at the time can impact on the
>> > > results.  I would suggest that for tests like this one, that we should
>> > > actually force that, by having it run against promoted_dc or
>> vampire_dc,
>> > > as well as a standalone server such as fl2008dc so that the
>> replication
>> > > behaviours and standalone behaviours are captured.
>> >
>> >
>> > This would be great - after all, tombstone reanimation has nothing to do
>> > with replication test and we have different set of tests to show deleted
>> > objects are replicated correctly.
>> > How can I achieve that? Should I move Tombstone reanimation test
>> > earlier in tests.py?
>> >
>> >
>> >
>> >
>> > > But regarding this test and the functionality it was intended to
>> prove,
>> > > what would you suggest?  The cleanest solution would be to revert the
>> > > lot until we understand it, but that also seems rather drastic.
>> > >
>> > > It seems reasonable to at least remove the module and test, even if
>> that
>> > > won't be a full revert, and then put it back when the full make test
>> is
>> > > shown not to flap.
>> > >
>> >
>> > You are totally right. I thought you want to revert the whole patch set,
>> > sorry!
>> > Disabling just this module is *very* reasonable approach indeed.
>>
>> I've found myself a bit busy, can you knock up the patch?
>>
> I will out of town, getting back tomorrow and i will post a patch.
> Hopefully I can play
> with reordering the tests to be executed in a 'cleaner' environment (I
> count on your
> help heavily there thought :)
>
> Cheers,
> Kamen
>
>
>>
>> Thanks,
>>
>> --
>> Andrew Bartlett
>> http://samba.org/~abartlet/
>> Authentication Developer, Samba Team  http://samba.org
>> Samba Developer, Catalyst IT
>> http://catalyst.net.nz/services/samba
>>
>>
>>
>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-dsdb-Disable-tombstone_reanimation-module-until-we-i.patch
Type: text/x-patch
Size: 3259 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20150531/80d548bc/attachment.bin>


More information about the samba-technical mailing list