autobuild[master] failure on sn-devel-144 for task ctdb during test

Martin Schwenke martin at meltin.net
Mon Aug 22 11:11:21 UTC 2016


Hi Metze,

On Mon, 22 Aug 2016 11:10:43 +0200, Stefan Metzmacher
<metze at samba.org> wrote:

> Hi Amitay and Martin,
> 
> I got the following failure on master (which just an WHATNEW.txt change)
> 
> Can you have a look?
> 
> TEST PASSED: tests/simple/77_ctdb_db_recovery.sh (duration: 38s)
> ==========================================================================
> --==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--
> Running test tests/simple/78_ctdb_large_db_recovery.sh (15:14:11)
> --==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--
> Cluster is HEALTHY
> create persistent test database large_persistent_db.tdb
> wipe test database large_persistent_db.tdb
> creating dummy record data
> 1+0 records in
> 1+0 records out
> 10240 bytes (10 kB) copied, 0.00140118 s, 7.3 MB/s
> Adding 345 records
> Failed to execute "ctdb pstore large_persistent_db.tdb record185
> /tmp/tmp.kai1PmB3jR" on node(s) "0"
> connect() failed, errno=111
> Failed to connect to CTDB daemon
> (/memdisk/metze/a/b538487/ctdb/ctdb/tests/var/sock.0)
> *** TEST COMPLETED (RC=1) AT 2016-08-21 15:14:15, CLEANING UP...
> Restarting CTDB (scheduled)...
> Attempting to politely shutdown daemons...
> connect() failed, errno=111
> Failed to connect to CTDB daemon
> (/memdisk/metze/a/b538487/ctdb/ctdb/tests/var/sock.0)
> connect() failed, errno=111
> Failed to connect to CTDB daemon
> (/memdisk/metze/a/b538487/ctdb/ctdb/tests/var/sock.1)
> connect() failed, errno=111
> Failed to connect to CTDB daemon
> (/memdisk/metze/a/b538487/ctdb/ctdb/tests/var/sock.2)
> Sleeping for a while...
> =1|.|
> Killing remaining daemons...
> Starting 3 ctdb daemons...
> Node 2 will have no public IPs.
> Waiting for cluster to become ready...
> <120|...........|11|
> OK
> Setting RerecoveryTimeout to 1
> Forcing a recovery...
> =2|..|
> Doing a sync...
> ctdb is ready
> ==========================================================================
> TEST FAILED: tests/simple/78_ctdb_large_db_recovery.sh (status 1)
> (duration: 29s)
> ==========================================================================

It looks like the ctdbd's all died unexpectedly.  Without the contents
of /memdisk/metze/a/b538487/ctdb/ctdb/tests/var/daemon.*.log it will be
impossible to know why.  :-(

I see a lot of cases like this in
https://git.samba.org/metze/samba-autobuild/ctdb.stdout but most are in
restarts after a test result has been decided.

We're not seeing this in our local overnight tests... I've done a quick
grep through recent results.

Were you running another autobuild (private, some other branch?) at
the same time?  If so, it could be due to
ctdb/tests/simple/scripts/local_daemons.bash:daemons_stop() killing
daemons from a parallel test run.  This isn't new and shouldn't
really come into play, since the daemons should respond to "ctdb
shutdown".  However, I should obviously fix it, now that I've noticed
it!  Will try to do that tomorrow... too tired now.

Apart from that, tomorrow I will check the logs from overnight tests to
see if it appears newly in our results too.  I can't see that this could
have been introduced by anything we've done in recent days.  I can also
try to recreate tomorrow.  

Sorry about any flakiness...  we've been pretty good until
recently... :-(

peace & happiness,
martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20160822/38ae5344/attachment.sig>


More information about the samba-technical mailing list