[PATCH] CTDB test improvements

Sun Feb 4 19:05:40 UTC 2018

On Fri, 2018-02-02 at 15:05 +1100, Martin Schwenke via samba-technical
wrote:
> 
> Sorry...  :-(
> 
> If I run that by hand on my laptop it takes only 1m44.363s!
> 
> Under valgrind it times out because it would take about 40 minutes to run!  That's annoying... and is why, when I hand test under valgrind, I always interrupt this test and make it fail.
> 
> In the last autobuild I did on sn-devel it took this long:
> 
> ==========================================================================
> TEST PASSED: tests/cunit/protocol_test_002.sh (duration: 125s)
> ==========================================================================
> 
> We're doing 1000 iterations with random data to ensure confidence in
> our protocol marshalling code.
> 
> Options:
> 
> * We can increase the timeout.
> 
>   This timeout is meant to be a public service to avoid indefinitely
>   hanging autobuilds due to an unexpected programming errors in tests.
>   It seems to have backfired.  :-(
> 
>   How long does it usually take to run in your cloud autobuilds?
> 
>   If you don't have any old logs showing this then "git grep
>   TEST_TIMEOUT" will show you where the default of 600s is set. Please
>   try adding a patch on top to increase it and see how long it take.
>   ctdb/tests/run_tests.sh ctdb/tests/cunit/protocol_test_002.sh will
>   let you run that test on its own.
> 
>   We could increase this timeout to an hour if we need to.

=======================================================================
===
TEST PASSED: tests/cunit/protocol_test_002.sh (duration: 953s)
=======================================================================
===

> * We could consider reducing the run-time of the test by doing less
>   iterations.  However, that obviously makes the test less useful.
> 
> * We can insist that Samba autobuild is run with a realistic amount of
>   CPU power!  ;-)
> 
>   I'm semi-serious here.  I wonder what you're doing that makes this
>   test run for more than 10 minutes.  The test uses a single
>   process/thread so it just needs a single CPU thread.
> 
>   I think I understand that you're running autobuilds in some sort of
>   constrained manner to make races more obvious, but there has to be a
>   lower bound on the resources needed to run autobuild.

The environment we are running autobuild.py on is currently a 2 CPU 16
GB ram VM on the Catalyst Cloud.  We are trying to reduce this back to
8GB and hope to eventually use 4GB. 

The primary constraint is cost, as Samba tests flap just too often and
overall take just too long, we regularly run them in parallel in order
to get results we can use.  (More CPUs cost more, naturally).

> We clearly need a solution...  I'm happy with the first as long as you
> can give me a number, so we're not playing whack-a-mole by continually
> patching the timeout upwards.  Will an hour do the trick?

I'm more than happy to get you a cloud VM to play with.

Thanks,

Andrew Bartlett
-- 
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba