[Samba] CTDB potential locking issue

Martin Schwenke martin at meltin.net
Wed Sep 19 04:19:35 UTC 2018


Hi David,

On Tue, 18 Sep 2018 19:34:25 +0100, David C via samba
<samba at lists.samba.org> wrote:

> I have a newly implemented two node CTDB cluster running on CentOS 7, Samba
> 4.7.1
> 
> The node network is a direct 1Gb link
> 
> Storage is Cephfs
> 
> ctdb status is OK
> 
> It seems to be running well so far but I'm frequently seeing the following
> in my log.smbd:
> 
> [2018/09/18 19:16:15.897742,  0]
> > ../source3/lib/dbwrap/dbwrap_ctdb.c:1207(fetch_locked_internal)
> >   db_ctdb_fetch_locked for /var/lib/ctdb/locking.tdb.1 key
> > DE0726567AF1EAFD4A741403000100000000000000000000, chain 3642 needed 16
> > attempts, 315 milliseconds, chainlock: 78.340000 ms, CTDB 236.511000 ms
> > [2018/09/18 19:16:15.958368,  0]
> > ../source3/lib/dbwrap/dbwrap_ctdb.c:1207(fetch_locked_internal)
> >   db_ctdb_fetch_locked for /var/lib/ctdb/locking.tdb.1 key
> > DE0726567AF1EAFD4A741403000100000000000000000000, chain 3642 needed 15
> > attempts, 297 milliseconds, chainlock: 58.532000 ms, CTDB 239.124000 ms
> > [2018/09/18 19:16:18.139443,  0]
> > ../source3/lib/dbwrap/dbwrap_ctdb.c:1207(fetch_locked_internal)
> >   db_ctdb_fetch_locked for /var/lib/ctdb/locking.tdb.1 key
> > DE0726567AF1EAFD4A741403000100000000000000000000, chain 3642 needed 11
> > attempts, 128 milliseconds, chainlock: 27.141000 ms, CTDB 101.450000 ms

> Can someone advise what this means and if it's something to be concerned
> about?

As SMB clients perform operations on files, ctdbd's main role is to
migrate metadata about those files, such as locking/share-mode info,
between nodes of the cluster, 

The above messages are telling you that ctdbd took more than a
pre-defined threshold to migrate a record.  This probably means that
there is contention between nodes for the file or directory represented
by the given key.  If this is the case then I would expect to see
similar messages in the log on each node.  If the numbers get much
higher then I would expect to see a performance impact.

Is it always the same key?  A small group of keys?  That is likely
to mean contention.  If migrations for many different keys are taking
longer than the threshold then ctdbd might just be overloaded.

You may be able to use the "net tdb locking" command to find out more
about the key in question.  You'll need to run the command while
clients are accessing the file represented by the key.  If it is
constantly and heavily then that shouldn't be a problem.  ;-)

If the contention is for the root directory of a share, and you don't
actually need lock coherency there, then you could think about using the

  fileid:algorithm = fsname_norootdir

option.  However, I note you're using "fileid:algorithm = fsid".  If
that is needed for Cephfs then the fsname_norootdir option might not be
appropriate.

You could also consider using the fileid:nolockinode hack if it is
appropriate.

You should definitely read vfs_fileid(8) before using either of these
options.

Although clustering has obvious benefits, it doesn't come for
free.  Dealing with contention can be tricky...  :-)

peace & happiness,
martin



More information about the samba mailing list