[Samba] CTDB Question w/ Winbind

Wed Oct 7 12:22:37 UTC 2020

Hi Bob,

On Tue, 6 Oct 2020 20:56:39 -0400, Robert Buck <robert.buck at som.com>
wrote:

> Hi Martin, you seem to do a lot of work on CTDB. Let me ask a question...

Yes, I have done a lot of work on CTDB.  A bit less lately...

> Is there a way to segment CTDB/Samba to minimize chatter? Specifically,
> what I have in mind... In recent years advances have been made in
> distributed SQL databases (ideas which are applicable here) whereby the
> communication profile between peers are minimized, and synchronization is
> never necessary except in circumstances where a peer has the data resident
> in memory and needs to perform an update (requiring an MVCC lock). Through
> a catalog you can find out who is the chairman for any particular record,
> thus be able to know who manages locks related to it, as well as handles
> contended updates. In this way, communication tends to be segmented, and
> lock management is localized.

If you check out https://wiki.samba.org/index.php/CTDB_database_design
you will see that CTDB uses something like a catalog to locate records
in distributed databases.  It uses a modulo scheme (based on active
nodes) to locate the "location master".  Martin Kleppmann's "Designing
Data-Intensive Applications" (https://dataintensive.net/) (which is
predated by CTDB) says this isn't a great idea, though mostly from a
database recovery perspective since a lot of database has to move... it
is fair to say that CTDB's database recovery isn't hugely optimised.
However, in general use I think the distributed database model is sound
and reasonably efficient.

I'd be interested in your perspective in the context of the above.

CTDB also has read-only delegation that can be enabled on distributed
databases.  I think this is used on some databases by default. There is
also something called "sticky records" which we haven't used much but it
is a simple approach to minimising record migration that might be
useful.

Volker Lendecke (from the Samba team) has started some work that
localises records in the locking.tdb database but I haven't kept up
with it.

> It seems to us, and we need to measure with wireshark, that CTDB with Samba
> forms a full-mesh network, yes? And because of the architecture and
> communication profile, performance of the system is about 1/100th of what
> it is when turned off. (Please bear in mind we're talking about
> geo-distributed deployments here, not ones localized to a single region,
> where latency is not an issue, so we're speaking of distances upwards of
> 10,000 miles longest leg, and 5000 miles on average.)
> 
> I've some experience in the area of distributed SQL databases, and it seems
> that perhaps some of the architectural patterns to optimize communications
> could apply here?

Yes, CTDB does form a full-mesh network.

However, it uses distributed databases for performance critical
volatile database.  Replicated databases are only (currently) used for
persistent databases and although these perform very badly they aren't
usually a bottleneck.

> All that said, if you know a way to optimize out a 1:100 performance
> penalty of using CTDB, please let us know.

Note the comments about contention in
https://wiki.samba.org/index.php/CTDB_database_design.  It mentions
some log messages to look for so you can start understanding the
contention.

Clustered Samba (with CTDB) does very badly when there is lots of
contention for records.  There are a few known ways of mitigating this.

Looking at one example, a record containing metadata (including share
mode data) for the root of a share can become very contented.  This can
be limited via the fileid:algorithm setting fsname_norootdir (see
https://www.samba.org/samba/docs/current/man-html/vfs_fileid.8.html).
However, before using this option you need to remember that its goal
is to break lock coherency in the root of a share, so it has to be
used very carefully.

Another way of destroying cluster performance is to put Windows
executables into clustered shares.  This can induce
near-silicon-melting contention in CTDB.  Try to find ways of avoiding
this.  I don't remember much about solutions for this.  However, the
"msdfs proxy" option may be of some help to push a share for such data
to a single node and simply not cluster it.

All that said, I think geographical distribution is going to be a
source of obvious latency.  Please check out the "lmaster capability"
option in the ctdb.conf(5) manual page.  However, I think Ronnie
Sahlberg originally added this option for situations where there is a
main cluster at one end of a WAN link and a subsidiary cluster at the
other end... I don't think it was aimed at generally solving the
problem of using CTDB in a geographically distributed manner.

Despite all I've said above, CTDB currently has no full-time
developers.  We have ideas for a new CTDB architecture, which has been
discussed in SambaXP conference talks by Amitay Isaacs and myself in
recent years.  One of the goals here is to structure CTDB more clearly
to reduce the barrier to entry for new developers.  We don't really
have obvious ideas for database optimisations but we would value any
ideas.

All input welcome... patches too!  :-D

peace & happiness,
martin