[Samba] CTDB Question w/ Winbind

Wed Oct 7 00:56:39 UTC 2020

Hi Martin, you seem to do a lot of work on CTDB. Let me ask a question...

Is there a way to segment CTDB/Samba to minimize chatter? Specifically,
what I have in mind... In recent years advances have been made in
distributed SQL databases (ideas which are applicable here) whereby the
communication profile between peers are minimized, and synchronization is
never necessary except in circumstances where a peer has the data resident
in memory and needs to perform an update (requiring an MVCC lock). Through
a catalog you can find out who is the chairman for any particular record,
thus be able to know who manages locks related to it, as well as handles
contended updates. In this way, communication tends to be segmented, and
lock management is localized.

It seems to us, and we need to measure with wireshark, that CTDB with Samba
forms a full-mesh network, yes? And because of the architecture and
communication profile, performance of the system is about 1/100th of what
it is when turned off. (Please bear in mind we're talking about
geo-distributed deployments here, not ones localized to a single region,
where latency is not an issue, so we're speaking of distances upwards of
10,000 miles longest leg, and 5000 miles on average.)

I've some experience in the area of distributed SQL databases, and it seems
that perhaps some of the architectural patterns to optimize communications
could apply here?

All that said, if you know a way to optimize out a 1:100 performance
penalty of using CTDB, please let us know.

Really appreciate your feedback and help.

Bob

On Tue, Oct 6, 2020 at 8:24 AM Robert Buck <robert.buck at som.com> wrote:

> Superb. I'll take a look. Thank you
>
> On Tue, Oct 6, 2020 at 1:46 AM Martin Schwenke <martin at meltin.net> wrote:
>
>> Hi Bob,
>>
>> On Mon, 5 Oct 2020 09:31:59 -0400, Robert Buck <robert.buck at som.com>
>> wrote:
>>
>> > It seems as though, when I go from `clustering = no` to `clustering =
>> yes`,
>> > if I do a domain join, it will fail. However, if I do a `systemctl
>> restart
>> > ctdb` (knowing full well it will fail every time), if after this I add a
>> > sleep(15), then do a domain join, then do a `systemctl restart ctdb`,
>> then
>> > the join will have worked, AND CTDB will start properly. So in a
>> nutshell,
>> > in Ansible,
>>
>> > - do all the samba setup without clustering on, even winbind setup;
>> verify
>> > it works
>> > - do all the ctdb setup and turn clustering on, but we must again
>> > domain-join, but only after having run restart-ctdb once first, then
>> after
>> > the join, do another restart-ctdb
>>
>> > Only then does the system come to a stable point.
>> >
>> > This appears to be the only way to have a repeatable deployment process
>> of
>> > CTDB over multiple regions globally.
>> >
>> > Any thoughts or recommendations?
>>
>> I think we need to document this better.  ;-)
>>
>> Although we've tried to explain things well in the wiki there are still
>> gaps... and this is one of them.   Although some of the tutorials around
>> the place are dated they fill in some of these gaps nicely.
>>
>> So, I'll repeat what Ralph said but with a few more words of
>> explanation...  :-)
>>
>> When clustering is enabled a new set of databases, managed by CTDB,
>> replaces those that were being used before.  This means that even if a
>> node was previously joined to a domain it will no longer be joined
>> after you enable clustering.  The credentials have basically
>> disappeared... unless you (immediately?) disable clustering again.
>>
>> In general, before you enable the 49.winbind and 50.samba event
>> scripts, you should start CTDB and join the domain.
>>
>> Then you can enable those scripts and restart CTDB so it will start the
>> services.
>>
>> Since you mention Ansible, I'll point you at autocluster, which I
>> rewrote (last year?) using Vagrant and Ansible.  It is a testing tool
>> to generate virtual clusters for (developer) testing of Clustered
>> Samba.  It has a lot of clues that need to make their way into
>> documentation.  We don't do releases but there is a git repository at:
>>
>>   https://git.samba.org/?p=autocluster.git;a=summary
>>
>> Here's the sequence of tasks that we use to configure a "nas" node:
>>
>>
>> https://git.samba.org/?p=autocluster.git;a=blob;f=ansible/node/roles/nas/tasks/main.yml;h=0c444bd77c0a883b1c608fcd6398592be8e962de;hb=73b6a2844e827b4c2c2b5d5946cc14c7c61d7d75
>>
>> In particular, this file disables the event scripts:
>>
>>
>> https://git.samba.org/?p=autocluster.git;a=blob;f=ansible/node/roles/nas/tasks/generic/ctdb.yml;h=0271d2a11cff0e9359e115f20c5e641e3279c3ea;hb=73b6a2844e827b4c2c2b5d5946cc14c7c61d7d75
>>
>> and later the domain is joined:
>>
>>
>> https://git.samba.org/?p=autocluster.git;a=blob;f=ansible/node/roles/nas/tasks/generic/ctdb-with-samba-nfs.yml;h=b6f9c6d2354e4922535d9048648df4e9e5161689;hb=73b6a2844e827b4c2c2b5d5946cc14c7c61d7d75
>>
>> Note that I'm not an Ansible expert and these Ansible playbooks aren't
>> necessarily idempotent.  At the moment it all works well enough and I
>> hope to get opportunities to clean it up more later.  It is very much
>> aimed at developer testing... but it would be cool if a subset of it
>> could be used to configure "real" Samba clusters.
>>
>> However, given that you mentioned Ansible I figure that it might
>> document certain things for you nice and clearly.  It isn't missing
>> anything obvious because we use it to build several test clusters each
>> night.
>>
>> One day later this week I'll try to take a look at the wiki and add some
>> documentation for joining a domain...
>>
>> peace & happiness,
>> martin
>>
>> --
>
> BOB BUCK
> SENIOR PLATFORM SOFTWARE ENGINEER
>
> SKIDMORE, OWINGS & MERRILL
> 7 WORLD TRADE CENTER
> 250 GREENWICH STREET
> NEW YORK, NY 10007
> T  (212) 298-9624
> ROBERT.BUCK at SOM.COM
>

-- 

BOB BUCK
SENIOR PLATFORM SOFTWARE ENGINEER

SKIDMORE, OWINGS & MERRILL
7 WORLD TRADE CENTER
250 GREENWICH STREET
NEW YORK, NY 10007
T  (212) 298-9624
ROBERT.BUCK at SOM.COM