Very frequent : Vacuuming child process timed out for db locking.tdb. Bottleneck?
Nicolas Ecarnot
nicolas at ecarnot.net
Mon May 26 13:30:56 MDT 2014
Le 26/05/2014 21:24, Nicolas Ecarnot a écrit :
> Le 26/05/2014 10:03, Amitay Isaacs a écrit :
>> Hi Nicolas,
>>
>> What version of CTDB and Samba are you using?
>
> samba-3.6.9-168.el6_5.x86_64
>
> ctdb-1.0.114.5-3.el6.x86_64
>
> on
>
> Oracle Linux Server release 6.5
>
> with
>
> 3.8.13-26.2.4.el6uek.x86_64
>
>>
>>
>> On Mon, May 19, 2014 at 7:51 PM, Nicolas Ecarnot <nicolas at ecarnot.net
>> <mailto:nicolas at ecarnot.net>> wrote:
>>
>> Hi,
>>
>> In our two-nodes ctdb setup, with an iscsi qdisk lun, the ctdb log
>> files are showing frequent message such as :
>>
>> Vacuuming child process timed out for db locking.tdb
>>
>> Frequency is around every 3 minutes.
Could this be due to the fact that I did not set up a dedicated private
LAN for ctdb data exchange, as it is sharing the user-data LAN ?
I don't think so because I took the time to have a look at the sizes of
the files, and the number of records, and both are ridiculously small.
>>
>>
>> This usually means that there is meta-data intensive activity happening
>> in Samba. For example if lots of files are opened and closed from
>> Samba, there will be lots of locking records created and deleted. These
>> records are removed cluster-wide via vacuuming. If vacuuming times out,
>> it means that the vacuuming process did not finish processing empty
>> records and it will process them in the next vacuuming cycle.
>>
>> I read it may be due to too numerous locks to "balance/sync" between
>> the nodes (did I read right?) and taking too much time.
>> This seems odd to me because we have around 300 users, doing basic
>> office work, and not particular intensive activity. This seems
>> classical to me.
>>
>>
>> This issue may not be related to contention at all, but may be caused by
>> meta-data intensive workload.
>>
>>
>> Our iscsi network is dedicated, and not much loaded.
>>
>> My two questions are :
>> - Could those error message mean this ctdb setup is LOOSING some
>> locks, and thus two users may access read+write the same file (and
>> then corrupt it)?
>>
>>
>> No. Problems in vacuuming will not cause Samba to corrupt files.
>>
>> Vacuuming is required to remove the deleted records from the cluster. It
>> does not affect the proper working of Samba. Only when Samba has
>> released the locks, the locking records will be empty and then CTDB has
>> to vacuum them. If vacuuming fails, usually it should not matter.
>> Vacuuming is triggered every 10 seconds for every database. So if one
>> run fails, subsequent runs should continue working. If vacuuming
>> consistently fails every time, then it will cause the database sizes to
>> grow very large and that can become a concern.
>>
>> - what do you advice me to look at, or what to bench?
>>
>>
>> In the latest version of CTDB, there have been significant changes to
>> improve vacuuming performance. So if possible, I would recommend using
>> the latest CTDB.
>>
>> Amitay.
>
>
--
Nicolas Ecarnot
More information about the samba-technical
mailing list