[PATCH] CTDB test improvements

Martin Schwenke martin at meltin.net
Mon Feb 5 04:55:10 UTC 2018


On Mon, 05 Feb 2018 08:05:40 +1300, Andrew Bartlett
<abartlet at samba.org> wrote:

> On Fri, 2018-02-02 at 15:05 +1100, Martin Schwenke via samba-technical
> wrote:
> > 
> > Sorry...  :-(
> > 
> > If I run that by hand on my laptop it takes only 1m44.363s!
> > 
> > Under valgrind it times out because it would take about 40 minutes
> > to run!  That's annoying... and is why, when I hand test under
> > valgrind, I always interrupt this test and make it fail.
> > 
> > In the last autobuild I did on sn-devel it took this long:
> > 
> > ==========================================================================
> > TEST PASSED: tests/cunit/protocol_test_002.sh (duration: 125s)
> > ==========================================================================
> > 
> > We're doing 1000 iterations with random data to ensure confidence in
> > our protocol marshalling code.
> > 
> > Options:
> > 
> > * We can increase the timeout.
> > 
> >   This timeout is meant to be a public service to avoid indefinitely
> >   hanging autobuilds due to an unexpected programming errors in
> > tests. It seems to have backfired.  :-(
> > 
> >   How long does it usually take to run in your cloud autobuilds?
> > 
> >   If you don't have any old logs showing this then "git grep
> >   TEST_TIMEOUT" will show you where the default of 600s is set.
> > Please try adding a patch on top to increase it and see how long it
> > take. ctdb/tests/run_tests.sh ctdb/tests/cunit/protocol_test_002.sh
> > will let you run that test on its own.
> > 
> >   We could increase this timeout to an hour if we need to.  
> 
> =======================================================================
> ===
> TEST PASSED: tests/cunit/protocol_test_002.sh (duration: 953s)
> =======================================================================
> ===

So, this can currently take more than 15 minutes.  If you reduce memory
in future then there might be some slowdown (swapping?), so we need some
wiggle room here...

> > * We could consider reducing the run-time of the test by doing less
> >   iterations.  However, that obviously makes the test less useful.
> > 
> > * We can insist that Samba autobuild is run with a realistic amount of
> >   CPU power!  ;-)
> > 
> >   I'm semi-serious here.  I wonder what you're doing that makes this
> >   test run for more than 10 minutes.  The test uses a single
> >   process/thread so it just needs a single CPU thread.
> > 
> >   I think I understand that you're running autobuilds in some sort of
> >   constrained manner to make races more obvious, but there has to be a
> >   lower bound on the resources needed to run autobuild.  
> 
> The environment we are running autobuild.py on is currently a 2 CPU 16
> GB ram VM on the Catalyst Cloud.  We are trying to reduce this back to
> 8GB and hope to eventually use 4GB. 
> 
> The primary constraint is cost, as Samba tests flap just too often and
> overall take just too long, we regularly run them in parallel in order
> to get results we can use.  (More CPUs cost more, naturally).
> 
> > We clearly need a solution...  I'm happy with the first as long as you
> > can give me a number, so we're not playing whack-a-mole by continually
> > patching the timeout upwards.  Will an hour do the trick?  
> 
> I'm more than happy to get you a cloud VM to play with.

No need.  There's no mystery here - the slowest test is just running
slower in your environment - so no analysis to do.  We just need to
increase the timeout.  Patch attached!  :-)

Please review and maybe push...

peace & happiness,
martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ctdb-tests-Set-test-timeout-to-an-hour.patch
Type: text/x-patch
Size: 1051 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20180205/cd288d35/0001-ctdb-tests-Set-test-timeout-to-an-hour.bin>


More information about the samba-technical mailing list