CTDB issues with locking.tdb with some workloads

Richard Sharpe realrichardsharpe at gmail.com
Wed Jul 28 18:11:36 UTC 2021


Hi folks,

One of the issues I have seen with CTDB relates to certain media workloads.

For this workload, the customer has several hundred Windows clients
that are processing data and writing to separate files. Each client
typically writes to a different file and rarely writes to shared
files. They sometimes do read the same files however.

Unfortunately, this creates problems with CTDB where we see lots of
logging messages saying that CTDB is having problems getting chain
locks, often taking between 5-10 seconds to get a chain lock.

As a result the workload slows down intolerably and some sites switch
off clustering altogether and run the risks of corruption.

One site I am aware of switched off clustering and joined each of
their Samba servers as separate member servers. They also had their
Windows client connect using something like DNS round robin.

In this case they were not protected against multiple clients writing
the same file but since they felt that was a rare occurrence they felt
the risk was acceptable.

The problem seems to be that there is a single locking.tcb file that
handles all files that need locking coordination. If there is a lot of
write activity there will be a lot of activity on locking.tdb and it
will move to Samba node to Samba node.

Perhaps one way to alleviate this issue would be to separate the
locking.tdb into one per file. Unfortunately, if the workload involves
millions of files there will be millions of TDB files.

Perhaps the workload is such that each client operates in a separate
directory, in which case we might have separate locking.tdb files per
directory, which should be several orders of magnitude lower than
per-file.

Has anyone thought of these issues before? Is there a sollution?

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)(传说杜康是酒的发明者)



More information about the samba-technical mailing list