[Samba] CTDB problems

Martin Schwenke martin at meltin.net
Thu Apr 20 02:19:45 UTC 2017


On Wed, 19 Apr 2017 12:55:45 +0100, Alex Crow via samba
<samba at lists.samba.org> wrote:

> This morning our CTDB managed cluster took a nosedive. We had member 
> machines with hung smbd tasks which causes them to reboot, and the 
> cluster did not come back up consistently. We eventually got it more or 
> less stable with two nodes out of the 3, but we're still seeing worrying 
> messages, eg we've just noticed:
> 
> [...]

> 2017/04/19 12:37:19.547636 [vacuum-locking.tdb: 3790]: tdb(/var/lib/ctdb/locking.tdb.2): tdb_oob len 541213780 beyond eof at 55386112
> 2017/04/19 12:37:19.547694 [vacuum-locking.tdb: 3790]: tdb(/var/lib/ctdb/locking.tdb.2): tdb_free: left offset read failed at 541213776
> 2017/04/19 12:37:19.547709 [vacuum-locking.tdb: 3790]: tdb(/var/lib/ctdb/locking.tdb.2): tdb_oob len 541213784 beyond eof at 55386112

No solid guesses on this.  Those messages come from deep in TDB.

Could the filesystem be full?

> [...]

> Here are some logs from earlier, where we think we had a stuck smbd task:
> 
> 28657 /usr/sbin/smbd locking.tdb.2 9848 9848 W
> 28687 /usr/sbin/smbd locking.tdb.2 186860 186860 W
> 18214 /usr/libexec/ctdb/ctdb_lock_helper locking.tdb.2 216548 216550 W
> 30945 /usr/sbin/smbd brlock.tdb.2.20170419.102626.697770650.corrupt 

> [...]

> ----- Stack trace for PID=30945 -----
> ----- Process in D state, printing kernel stack only
> [<ffffffffa05b253d>] __fuse_request_send+0x13d/0x2c0 [fuse]
> [<ffffffffa05b26d2>] fuse_request_send+0x12/0x20 [fuse]
> [<ffffffffa05bb66c>] fuse_setlk+0x16c/0x1a0 [fuse]
> [<ffffffffa05bc40f>] fuse_file_lock+0x5f/0x210 [fuse]
> [<ffffffff81253a73>] vfs_lock_file+0x23/0x40
> [<ffffffff81255069>] fcntl_setlk+0x159/0x310
> [<ffffffff81210fe1>] SyS_fcntl+0x3e1/0x610
> [<ffffffff816968c9>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff

So this tells you that smbd was wedged in the cluster filesystem.

> [...]

> It does look like we have some database corruption.
>
> What may have caused this, and is there any way to resolve it?

The good news is that you're only seeing it in vacuuming and you're
not actually seeing TDB errors in smbd.

Still, it isn't something we've seen.  If we figure out anything then
we'll definitely let you know...

peace & happiness,
martin



More information about the samba mailing list