CTDB issues with locking.tdb with some workloads

Wed Jul 28 19:05:56 UTC 2021

On Wed, Jul 28, 2021 at 11:11:36AM -0700, Richard Sharpe via samba-technical wrote:
> Hi folks,
> 
> One of the issues I have seen with CTDB relates to certain media workloads.
> 
> For this workload, the customer has several hundred Windows clients
> that are processing data and writing to separate files. Each client
> typically writes to a different file and rarely writes to shared
> files. They sometimes do read the same files however.
> 
> Unfortunately, this creates problems with CTDB where we see lots of
> logging messages saying that CTDB is having problems getting chain
> locks, often taking between 5-10 seconds to get a chain lock.
> 
> As a result the workload slows down intolerably and some sites switch
> off clustering altogether and run the risks of corruption.
> 
> One site I am aware of switched off clustering and joined each of
> their Samba servers as separate member servers. They also had their
> Windows client connect using something like DNS round robin.
> 
> In this case they were not protected against multiple clients writing
> the same file but since they felt that was a rare occurrence they felt
> the risk was acceptable.
> 
> The problem seems to be that there is a single locking.tcb file that
> handles all files that need locking coordination. If there is a lot of
> write activity there will be a lot of activity on locking.tdb and it
> will move to Samba node to Samba node.
> 
> Perhaps one way to alleviate this issue would be to separate the
> locking.tdb into one per file. Unfortunately, if the workload involves
> millions of files there will be millions of TDB files.
> 
> Perhaps the workload is such that each client operates in a separate
> directory, in which case we might have separate locking.tdb files per
> directory, which should be several orders of magnitude lower than
> per-file.
> 
> Has anyone thought of these issues before? Is there a sollution?

A likely scenario is that all Windows clients open the share root
directory for notifications, and that creates contention on a single
locking.tdb record. The time_audit vfs module can help somewhat
pin-pointing that. If that contention on the share root directory is
actually the problem, a workaround would be setting:

  fileid:algorithm = fsname_norootdir

Christof