[Samba] CTDB potential locking issue
David C
dcsysengineer at gmail.com
Wed Sep 19 18:00:03 UTC 2018
Hi Martin
Many thanks for the detailed response. A few follow-ups inline:
On Wed, Sep 19, 2018 at 5:19 AM Martin Schwenke <martin at meltin.net> wrote:
> Hi David,
>
> On Tue, 18 Sep 2018 19:34:25 +0100, David C via samba
> <samba at lists.samba.org> wrote:
>
> > I have a newly implemented two node CTDB cluster running on CentOS 7,
> Samba
> > 4.7.1
> >
> > The node network is a direct 1Gb link
> >
> > Storage is Cephfs
> >
> > ctdb status is OK
> >
> > It seems to be running well so far but I'm frequently seeing the
> following
> > in my log.smbd:
> >
> > [2018/09/18 19:16:15.897742, 0]
> > > ../source3/lib/dbwrap/dbwrap_ctdb.c:1207(fetch_locked_internal)
> > > db_ctdb_fetch_locked for /var/lib/ctdb/locking.tdb.1 key
> > > DE0726567AF1EAFD4A741403000100000000000000000000, chain 3642 needed 16
> > > attempts, 315 milliseconds, chainlock: 78.340000 ms, CTDB 236.511000 ms
> > > [2018/09/18 19:16:15.958368, 0]
> > > ../source3/lib/dbwrap/dbwrap_ctdb.c:1207(fetch_locked_internal)
> > > db_ctdb_fetch_locked for /var/lib/ctdb/locking.tdb.1 key
> > > DE0726567AF1EAFD4A741403000100000000000000000000, chain 3642 needed 15
> > > attempts, 297 milliseconds, chainlock: 58.532000 ms, CTDB 239.124000 ms
> > > [2018/09/18 19:16:18.139443, 0]
> > > ../source3/lib/dbwrap/dbwrap_ctdb.c:1207(fetch_locked_internal)
> > > db_ctdb_fetch_locked for /var/lib/ctdb/locking.tdb.1 key
> > > DE0726567AF1EAFD4A741403000100000000000000000000, chain 3642 needed 11
> > > attempts, 128 milliseconds, chainlock: 27.141000 ms, CTDB 101.450000 ms
>
> > Can someone advise what this means and if it's something to be concerned
> > about?
>
> As SMB clients perform operations on files, ctdbd's main role is to
> migrate metadata about those files, such as locking/share-mode info,
> between nodes of the cluster,
>
> The above messages are telling you that ctdbd took more than a
> pre-defined threshold to migrate a record. This probably means that
> there is contention between nodes for the file or directory represented
> by the given key. If this is the case then I would expect to see
> similar messages in the log on each node. If the numbers get much
> higher then I would expect to see a performance impact.
>
> Is it always the same key? A small group of keys? That is likely
> to mean contention. If migrations for many different keys are taking
> longer than the threshold then ctdbd might just be overloaded.
>
Confirmed always the same key which I suppose is good news?
> You may be able to use the "net tdb locking" command to find out more
> about the key in question. You'll need to run the command while
> clients are accessing the file represented by the key. If it is
> constantly and heavily then that shouldn't be a problem. ;-)
>
Currently reporting: "Record with key
DE0726567AF1EAFD4A741403000100000000000000000000 not found."
So I guess clients aren't currently accessing it, the messages are fairly
frequent so I should be able to catch it. I may just run that command on a
loop until it catches it.
Is there any other way of translating the key to the inode?
>
> If the contention is for the root directory of a share, and you don't
> actually need lock coherency there, then you could think about using the
>
> fileid:algorithm = fsname_norootdir
>
> option. However, I note you're using "fileid:algorithm = fsid". If
> that is needed for Cephfs then the fsname_norootdir option might not be
> appropriate.
>
This was a leftover from a short-lived experiment with OCFS2 where I think
it was required. I think CephFS should be fine with fsname.
>
> You could also consider using the fileid:nolockinode hack if it is
> appropriate.
>
> You should definitely read vfs_fileid(8) before using either of these
> options.
>
I'll have a read. Thanks again for your assistance.
>
> Although clustering has obvious benefits, it doesn't come for
> free. Dealing with contention can be tricky... :-)
>
> peace & happiness,
> martin
>
More information about the samba
mailing list