[Samba] CTDB potential locking issue

David C dcsysengineer at gmail.com
Wed Sep 19 18:44:20 UTC 2018


Hi Martin

I just found the file, it's a config file that about 250 hosts read every
half an hour so it makes sense this is getting some contention. However,
the strange thing is, the share the file is in is a read-only share:

[dist$]
        comment = Windows dist
        path = /path/to/share
        wide links = Yes
        browseable = Yes
        read only = Yes
        guest only = Yes
        guest ok = Yes
        public = Yes
        hide dot files = yes
        hide files = /$*/
        hide special files = yes

The share is accessed by the Windows machines to install software, read
configs etc. I would have thought the share being read only would preclude
this type of locking behaviour?

Do I need to explicitly disable locking in the share definition?

I suppose I could still use the fileid:nolockinode for this file, do I just
add fileid:nolockinode = *inodenumber *to the global section of my smb.conf?

Thanks,
David







On Wed, Sep 19, 2018 at 7:00 PM David C <dcsysengineer at gmail.com> wrote:

> Hi Martin
>
> Many thanks for the detailed response. A few follow-ups inline:
>
> On Wed, Sep 19, 2018 at 5:19 AM Martin Schwenke <martin at meltin.net> wrote:
>
>> Hi David,
>>
>> On Tue, 18 Sep 2018 19:34:25 +0100, David C via samba
>> <samba at lists.samba.org> wrote:
>>
>> > I have a newly implemented two node CTDB cluster running on CentOS 7,
>> Samba
>> > 4.7.1
>> >
>> > The node network is a direct 1Gb link
>> >
>> > Storage is Cephfs
>> >
>> > ctdb status is OK
>> >
>> > It seems to be running well so far but I'm frequently seeing the
>> following
>> > in my log.smbd:
>> >
>> > [2018/09/18 19:16:15.897742,  0]
>> > > ../source3/lib/dbwrap/dbwrap_ctdb.c:1207(fetch_locked_internal)
>> > >   db_ctdb_fetch_locked for /var/lib/ctdb/locking.tdb.1 key
>> > > DE0726567AF1EAFD4A741403000100000000000000000000, chain 3642 needed 16
>> > > attempts, 315 milliseconds, chainlock: 78.340000 ms, CTDB 236.511000
>> ms
>> > > [2018/09/18 19:16:15.958368,  0]
>> > > ../source3/lib/dbwrap/dbwrap_ctdb.c:1207(fetch_locked_internal)
>> > >   db_ctdb_fetch_locked for /var/lib/ctdb/locking.tdb.1 key
>> > > DE0726567AF1EAFD4A741403000100000000000000000000, chain 3642 needed 15
>> > > attempts, 297 milliseconds, chainlock: 58.532000 ms, CTDB 239.124000
>> ms
>> > > [2018/09/18 19:16:18.139443,  0]
>> > > ../source3/lib/dbwrap/dbwrap_ctdb.c:1207(fetch_locked_internal)
>> > >   db_ctdb_fetch_locked for /var/lib/ctdb/locking.tdb.1 key
>> > > DE0726567AF1EAFD4A741403000100000000000000000000, chain 3642 needed 11
>> > > attempts, 128 milliseconds, chainlock: 27.141000 ms, CTDB 101.450000
>> ms
>>
>> > Can someone advise what this means and if it's something to be concerned
>> > about?
>>
>> As SMB clients perform operations on files, ctdbd's main role is to
>> migrate metadata about those files, such as locking/share-mode info,
>> between nodes of the cluster,
>>
>> The above messages are telling you that ctdbd took more than a
>> pre-defined threshold to migrate a record.  This probably means that
>> there is contention between nodes for the file or directory represented
>> by the given key.  If this is the case then I would expect to see
>> similar messages in the log on each node.  If the numbers get much
>> higher then I would expect to see a performance impact.
>>
>> Is it always the same key?  A small group of keys?  That is likely
>> to mean contention.  If migrations for many different keys are taking
>> longer than the threshold then ctdbd might just be overloaded.
>>
>
> Confirmed always the same key which I suppose is good news?
>
>
>> You may be able to use the "net tdb locking" command to find out more
>> about the key in question.  You'll need to run the command while
>> clients are accessing the file represented by the key.  If it is
>> constantly and heavily then that shouldn't be a problem.  ;-)
>>
>
> Currently reporting: "Record with key
> DE0726567AF1EAFD4A741403000100000000000000000000 not found."
>
> So I guess clients aren't currently accessing it, the messages are fairly
> frequent so I should be able to catch it. I may just run that command on a
> loop until it catches it.
>
> Is there any other way of translating the key to the inode?
>
>>
>> If the contention is for the root directory of a share, and you don't
>> actually need lock coherency there, then you could think about using the
>>
>>   fileid:algorithm = fsname_norootdir
>>
>> option.  However, I note you're using "fileid:algorithm = fsid".  If
>> that is needed for Cephfs then the fsname_norootdir option might not be
>> appropriate.
>>
>
> This was a leftover from a short-lived experiment with OCFS2 where I think
> it was required. I think CephFS should be fine with fsname.
>
>>
>> You could also consider using the fileid:nolockinode hack if it is
>> appropriate.
>>
>> You should definitely read vfs_fileid(8) before using either of these
>> options.
>>
>
> I'll have a read. Thanks again for your assistance.
>
>>
>> Although clustering has obvious benefits, it doesn't come for
>> free.  Dealing with contention can be tricky...  :-)
>>
>> peace & happiness,
>> martin
>>
>


More information about the samba mailing list