Very frequent : Vacuuming child process timed out for db locking.tdb. Bottleneck?

Mon May 26 13:30:56 MDT 2014

Le 26/05/2014 21:24, Nicolas Ecarnot a écrit :
> Le 26/05/2014 10:03, Amitay Isaacs a écrit :
>> Hi Nicolas,
>>
>> What version of CTDB and Samba are you using?
>
> samba-3.6.9-168.el6_5.x86_64
>
> ctdb-1.0.114.5-3.el6.x86_64
>
> on
>
> Oracle Linux Server release 6.5
>
> with
>
> 3.8.13-26.2.4.el6uek.x86_64
>
>>
>>
>> On Mon, May 19, 2014 at 7:51 PM, Nicolas Ecarnot <nicolas at ecarnot.net
>> <mailto:nicolas at ecarnot.net>> wrote:
>>
>>     Hi,
>>
>>     In our two-nodes ctdb setup, with an iscsi qdisk lun, the ctdb log
>>     files are showing frequent message such as :
>>
>>     Vacuuming child process timed out for db locking.tdb
>>
>>     Frequency is around every 3 minutes.

Could this be due to the fact that I did not set up a dedicated private 
LAN for ctdb data exchange, as it is sharing the user-data LAN ?

I don't think so because I took the time to have a look at the sizes of 
the files, and the number of records, and both are ridiculously small.

>>
>>
>> This usually means that there is meta-data intensive activity happening
>> in Samba.  For example if lots of files are opened and closed from
>> Samba, there will be lots of locking records created and deleted.  These
>> records are removed cluster-wide via vacuuming.  If vacuuming times out,
>> it means that the vacuuming process did not finish processing empty
>> records and it will process them in the next vacuuming cycle.
>>
>>     I read it may be due to too numerous locks to "balance/sync" between
>>     the nodes (did I read right?) and taking too much time.
>>     This seems odd to me because we have around 300 users, doing basic
>>     office work, and not particular intensive activity. This seems
>>     classical to me.
>>
>>
>> This issue may not be related to contention at all, but may be caused by
>> meta-data intensive workload.
>>
>>
>>     Our iscsi network is dedicated, and not much loaded.
>>
>>     My two questions are :
>>     - Could those error message mean this ctdb setup is LOOSING some
>>     locks, and thus two users may access read+write the same file (and
>>     then corrupt it)?
>>
>>
>> No. Problems in vacuuming will not cause Samba to corrupt files.
>>
>> Vacuuming is required to remove the deleted records from the cluster. It
>> does not affect the proper working of Samba.  Only when Samba has
>> released the locks, the locking records will be empty and then CTDB has
>> to vacuum them.  If vacuuming fails, usually it should not matter.
>> Vacuuming is triggered every 10 seconds for every database.  So if one
>> run fails, subsequent runs should continue working.  If vacuuming
>> consistently fails every time, then it will cause the database sizes to
>> grow very large and that can become a concern.
>>
>>     - what do you advice me to look at, or what to bench?
>>
>>
>> In the latest version of CTDB, there have been significant changes to
>> improve vacuuming performance.  So if possible, I would recommend using
>> the latest CTDB.
>>
>> Amitay.
>
>

-- 
Nicolas Ecarnot