Patch to support Scalable CTDB

Fri Apr 27 15:31:27 UTC 2018

Thanks, Amitay for your feedback on the patch and the scalability of CTDB.

How do we do the VNNMAP groups, could you please give an example of doing
that.

Regards,
--Partha

On Fri, Apr 27, 2018 at 1:37 AM, Amitay Isaacs <amitay at gmail.com> wrote:

> Hi Partha,
>
> On Fri, Apr 27, 2018 at 10:14 AM, Partha Sarathi via samba-technical
> <samba-technical at lists.samba.org> wrote:
> > Hi,
> >
> > We have a requirement to support a large cluster i.e scaling from 20
> nodes
> > to 50 nodes cluster and the current CTDB design may not support the
> linear
> > scaling of nodes in the cluster as the replication of all lock
> > related tdbs and tdb traversing for every record may slow down the
> > performance.
>
> There are many areas in CTDB that require work to improve scalability
> to large number of nodes.  Many of the improvements are on the
> roadmap.
>
> <shameless promo>
> One of the major improvements is to split the monolithic daemon into
> separate daemons.  Martin and I have been doing lots of ground work to
> get to a point where we can start introducing separate daemons.  There
> will be lots of patches appearing on the mailing list soon to that
> effect.  This will eventually get us to leaner database daemon(s).
> </shameless promo>
>
> >  The product requirement for creating large number nodes say 50 in a
> > cluster with subgrouping them into three/five nodes into
> > multiple protocol heads. Each of these protocol head group nodes will
> host
> > a specific set of shares not all.  So we took an approach to create two
> > instances of CTDBD on each node.
> >
> > 1) The primary ctdbd (Persistent ctdbd) is responsible to just replicate
> > the persistent TDBs across the cluster in our case 50 nodes to maintain
> the
> > AD registered account details and supporting single global namespace
> across
> > the large cluster.
> >
> > 2) The secondary instance called ( Locking ctdbd) is responsible
> > to replicate and traverse the lock related TDBs within the protocol heads
> > group in that way reducing the latency TDB transactions (expensive when
> the
> > number of nodes is large) by just communicating within the limited nodes
> > group.
> >
> > 3) Smbd changed in such a way that it communicates against these two
> > instance of CTDBDs with different ctdbd sockets. The message inits and
> the
> > connection handling have been well-taken care.
> >
> > To have the above ctdbd running independently they are configured
> > separately and listening on different ctdb ports (4379 and 4380)
> > respectively.
>
> It's an interesting hack.  But I would not recommend running multiple
> instances of ctdb daemon.  Among the many reasons is "ctdb daemon is
> still running with real-time".  You definitely don't want multiple
> user-space daemons running at real-time priority.  Additionally, two
> ctdb instances create unnecessary network overhead of double the
> communication for two separate ctdb clusters.
>
> One approach for solving this problem would be VNNMAP groups.
>
> VNNMAP is a collection of nodes which participate in database
> activity.  Even though it's applicable to both the persistent and the
> volatile databases, it has more effect on the volatile databases.
> Volatile databases are the distributed temporary databases (e.g.
> locking.tdb).  Currently all the nodes are in a single VNNMAP.
>
> With VNNMAP groups, we can partition the nodes into groups. Each group
> then maintains the volatile databases independently from the other
> group.  Of course samba configuration (share definitions) must to be
> identical for all the nodes in a group.  Also, samba shares across two
> different groups cannot have overlapping file system directories
> (unless they are read-only shares).  This should effectively reproduce
> the same behaviour you have achieved with two ctdb instances, but
> without needing any change in samba.
>
> Amitay.
>

-- 
Thanks & Regards
-Partha