autobuild[master] failure on sn-devel-144 for task ctdb during test
Martin Schwenke
martin at meltin.net
Mon Aug 22 11:11:21 UTC 2016
Hi Metze,
On Mon, 22 Aug 2016 11:10:43 +0200, Stefan Metzmacher
<metze at samba.org> wrote:
> Hi Amitay and Martin,
>
> I got the following failure on master (which just an WHATNEW.txt change)
>
> Can you have a look?
>
> TEST PASSED: tests/simple/77_ctdb_db_recovery.sh (duration: 38s)
> ==========================================================================
> --==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--
> Running test tests/simple/78_ctdb_large_db_recovery.sh (15:14:11)
> --==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--
> Cluster is HEALTHY
> create persistent test database large_persistent_db.tdb
> wipe test database large_persistent_db.tdb
> creating dummy record data
> 1+0 records in
> 1+0 records out
> 10240 bytes (10 kB) copied, 0.00140118 s, 7.3 MB/s
> Adding 345 records
> Failed to execute "ctdb pstore large_persistent_db.tdb record185
> /tmp/tmp.kai1PmB3jR" on node(s) "0"
> connect() failed, errno=111
> Failed to connect to CTDB daemon
> (/memdisk/metze/a/b538487/ctdb/ctdb/tests/var/sock.0)
> *** TEST COMPLETED (RC=1) AT 2016-08-21 15:14:15, CLEANING UP...
> Restarting CTDB (scheduled)...
> Attempting to politely shutdown daemons...
> connect() failed, errno=111
> Failed to connect to CTDB daemon
> (/memdisk/metze/a/b538487/ctdb/ctdb/tests/var/sock.0)
> connect() failed, errno=111
> Failed to connect to CTDB daemon
> (/memdisk/metze/a/b538487/ctdb/ctdb/tests/var/sock.1)
> connect() failed, errno=111
> Failed to connect to CTDB daemon
> (/memdisk/metze/a/b538487/ctdb/ctdb/tests/var/sock.2)
> Sleeping for a while...
> =1|.|
> Killing remaining daemons...
> Starting 3 ctdb daemons...
> Node 2 will have no public IPs.
> Waiting for cluster to become ready...
> <120|...........|11|
> OK
> Setting RerecoveryTimeout to 1
> Forcing a recovery...
> =2|..|
> Doing a sync...
> ctdb is ready
> ==========================================================================
> TEST FAILED: tests/simple/78_ctdb_large_db_recovery.sh (status 1)
> (duration: 29s)
> ==========================================================================
It looks like the ctdbd's all died unexpectedly. Without the contents
of /memdisk/metze/a/b538487/ctdb/ctdb/tests/var/daemon.*.log it will be
impossible to know why. :-(
I see a lot of cases like this in
https://git.samba.org/metze/samba-autobuild/ctdb.stdout but most are in
restarts after a test result has been decided.
We're not seeing this in our local overnight tests... I've done a quick
grep through recent results.
Were you running another autobuild (private, some other branch?) at
the same time? If so, it could be due to
ctdb/tests/simple/scripts/local_daemons.bash:daemons_stop() killing
daemons from a parallel test run. This isn't new and shouldn't
really come into play, since the daemons should respond to "ctdb
shutdown". However, I should obviously fix it, now that I've noticed
it! Will try to do that tomorrow... too tired now.
Apart from that, tomorrow I will check the logs from overnight tests to
see if it appears newly in our results too. I can't see that this could
have been introduced by anything we've done in recent days. I can also
try to recreate tomorrow.
Sorry about any flakiness... we've been pretty good until
recently... :-(
peace & happiness,
martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20160822/38ae5344/attachment.sig>
More information about the samba-technical
mailing list