[PATCH] CTDB test improvements
Martin Schwenke
martin at meltin.net
Wed Feb 7 08:34:15 UTC 2018
On Mon, 5 Feb 2018 15:55:10 +1100, Martin Schwenke via samba-technical
<samba-technical at lists.samba.org> wrote:
> On Mon, 05 Feb 2018 08:05:40 +1300, Andrew Bartlett
> <abartlet at samba.org> wrote:
>
> > On Fri, 2018-02-02 at 15:05 +1100, Martin Schwenke via samba-technical
> > wrote:
> > >
> > > Sorry... :-(
> > >
> > > If I run that by hand on my laptop it takes only 1m44.363s!
> > >
> > > Under valgrind it times out because it would take about 40 minutes
> > > to run! That's annoying... and is why, when I hand test under
> > > valgrind, I always interrupt this test and make it fail.
> > >
> > > In the last autobuild I did on sn-devel it took this long:
> > >
> > > ==========================================================================
> > > TEST PASSED: tests/cunit/protocol_test_002.sh (duration: 125s)
> > > ==========================================================================
> > >
> > > We're doing 1000 iterations with random data to ensure confidence in
> > > our protocol marshalling code.
> > >
> > > Options:
> > >
> > > * We can increase the timeout.
> > >
> > > This timeout is meant to be a public service to avoid indefinitely
> > > hanging autobuilds due to an unexpected programming errors in
> > > tests. It seems to have backfired. :-(
> > >
> > > How long does it usually take to run in your cloud autobuilds?
> > >
> > > If you don't have any old logs showing this then "git grep
> > > TEST_TIMEOUT" will show you where the default of 600s is set.
> > > Please try adding a patch on top to increase it and see how long it
> > > take. ctdb/tests/run_tests.sh ctdb/tests/cunit/protocol_test_002.sh
> > > will let you run that test on its own.
> > >
> > > We could increase this timeout to an hour if we need to.
> >
> > =======================================================================
> > ===
> > TEST PASSED: tests/cunit/protocol_test_002.sh (duration: 953s)
> > =======================================================================
> > ===
>
> So, this can currently take more than 15 minutes. If you reduce memory
> in future then there might be some slowdown (swapping?), so we need some
> wiggle room here...
>
> > > * We could consider reducing the run-time of the test by doing less
> > > iterations. However, that obviously makes the test less useful.
> > >
> > > * We can insist that Samba autobuild is run with a realistic amount of
> > > CPU power! ;-)
> > >
> > > I'm semi-serious here. I wonder what you're doing that makes this
> > > test run for more than 10 minutes. The test uses a single
> > > process/thread so it just needs a single CPU thread.
> > >
> > > I think I understand that you're running autobuilds in some sort of
> > > constrained manner to make races more obvious, but there has to be a
> > > lower bound on the resources needed to run autobuild.
> >
> > The environment we are running autobuild.py on is currently a 2 CPU 16
> > GB ram VM on the Catalyst Cloud. We are trying to reduce this back to
> > 8GB and hope to eventually use 4GB.
> >
> > The primary constraint is cost, as Samba tests flap just too often and
> > overall take just too long, we regularly run them in parallel in order
> > to get results we can use. (More CPUs cost more, naturally).
> >
> > > We clearly need a solution... I'm happy with the first as long as you
> > > can give me a number, so we're not playing whack-a-mole by continually
> > > patching the timeout upwards. Will an hour do the trick?
> >
> > I'm more than happy to get you a cloud VM to play with.
>
> No need. There's no mystery here - the slowest test is just running
> slower in your environment - so no analysis to do. We just need to
> increase the timeout. Patch attached! :-)
>
> Please review and maybe push...
Please note that I'm not being silly and grumpy here. This is the
actual fix!
Any takers? :-)
peace & happiness,
martin
More information about the samba-technical
mailing list