dbench scalability testing

Sat Mar 10 02:33:33 GMT 2001

Hi,

> I am also doing linux/samba/scalability measurment/improvement for IBM
> LTC, and would be interested in sharing ideas, results, etc. My goal is
> this: measure linux 2.4 scalability now, prod, peek, & poke at what's
> going on (mostly focused in the kernel), work/improve on the slow stuff,
> hopefully improve scalability.  

Great!

> We are currently using a Netfinity 8-way, 700Mhz, 1MB L2, 8.5 GB RAM,
> 8x100 Mbit Enet.  Netbench is currently the target benchmark, but I want
> to move my work over to smbtorture soon.  We are using eepro100 cards,
> not sure if those support zero copy (I have not checked), and we will
> probably move to 4x1 Gbps intel-1000 cards.  Right now I am in the
> process of getting some baseline numbers for UP, 1P, 2P, 4P, & 8P.  Are
> biggest problem so far has been getting enough clients to drive the
> test.  We only have 16 now, but hopefully soon we will have 64 clients. 
> I have some ideas about modifying smbtorture to support multiple clients
> to one server to help drive to workload.  Let me know if you want to
> collaborate on some of this stuff.

The zero copy patches do not support eepro100s. My understanding is that
Intel haven't been forthcoming with specs for this card and Im not sure if
they even do scatter gather + hw checksumming (you would have to ask one of
the networking guys).

Also I do not know the state of the intel gigabit cards, my suggestion is to
use acenic based cards since they have been the most tested (for example the
tux2 benchmarks use 8 acenics with the zero copy patches).

You should be able to synchronise the start of smbtortures on different
client machines and then sum the results. In reality you shouldnt need many
clients running smbtorture to saturate a server. For example, a single
333MHz cpu POWER3 as a client manages to push our servers along at 25MB/s.
So I'd be surprised if the 8 way will max out 4 of these all sitting back to
back on their own gigabit channel.

Actually, I'll round up my patch to use MSG_TRUNC in smbtorture. This
improves client performance somewhat as we dont waste cpu cycles copying
data between kernel and user space. (When things dont run fast enough, fix
the benchmark :)

For basic testing, smbtorture is much nicer to work with than 16+ windows
machines but it would be nice to get some verification that it really
approximates netbench at the high end. Would it be possible to do some
comparison runs?

Anton