CTDB scaling?

Thu Nov 20 11:19:54 MST 2014

On Thu, Nov 20, 2014 at 9:33 AM, Richard Sharpe
<realrichardsharpe at gmail.com> wrote:
> On Wed, Nov 19, 2014 at 5:44 PM, ronnie sahlberg
> <ronniesahlberg at gmail.com> wrote:
>> On Wed, Nov 19, 2014 at 3:49 PM, Richard Sharpe
>> <realrichardsharpe at gmail.com> wrote:
>>>
>>> Hi folks,
>>>
>>> In Tridge's 2007 paper:
>>>
>>> he claims the following performance scaling:
>>> https://www.samba.org/~tridge/sambaxp-07/ctdb.pdf
>>>
>>> NEW (CTDB) approach
>>> 1 node 42 Mbytes/sec
>>> 2 nodes 168 MBytes/sec
>>> 3 nodes 211 MBytes/sec
>>> 4 nodes 243 MBytes/sec
>>>
>>> This seems counter intuitive. 2 nodes gets four times what one node
>>> gets and four nodes gets almost six times what 1 node does?
>>>
>>> What is the explanation for that?
>>>
>>
>> The superlinear scaling is likely due to the increase of memory for caching.
>> This is recall is the uncontended case where you have little cross node
>> traffic.
>
> Hmm, it is still not obvious. There seems to be several things going on here.
>
> Is it possible that the same NBENCH load was offered across all four
> configurations?
>
> That would make more sense. Then, in the one node case we were hitting
> the one node limit, and as you say, with two nodes and the load
> divided between them, more memory was available for caching so we see
> a big boost there. After that, we seem to be hitting the IO limit of
> the cluster because more memory does not seem to help that much ... By
> the time we hit six nodes it looks like we would probably be seeing
> only another 20MB/s or less for the additional nodes.

Probably.
I think I recall that tridge might have done these very early tests on
virtual machines running under QEMU/KVM
thus making the actual numbers even more difficult to parse/meaningless.

I think the only real takeaway from these numbers is that "scaling is
good for that kind of workload".

Later tests on real hardware showed for uncontended cases pretty much
linear scaling up to ~30 nodes
all pretty much saturating 10GbE on each node in CIFS traffic.
For the uncontended case.

The heavily contended case often did not scale well at all. Later
additions to the protocol such as sticky records and
read-only record delegations helped for some, not nearly all,
workloads but I do not know if any performance numbers were ever
collected on that.

>
> --
> Regards,
> Richard Sharpe
> (何以解憂？唯有杜康。--曹操)