[Samba] Samba performance

Juan Pablo jhurcad at yahoo.com
Thu Jun 2 13:24:44 MDT 2011


Hi Stan,

Thanks for your feedback and suggestions!


The disk subsystem is composed by:

- 8 WD2002FAEX SATA 2TB hard drives (7200 RPM, 64MB cache, 4.2 ms avg latency)
- 1 Intel RAID controller RS2BL080 with 512 MB configured with 1 virtual  drive 
12.7 TB (hardware RAID 5 with 1 MB stripe size, caches enabled,  read-ahead 
enabled)

In your experience, should I expect higher performance from this hardware?

Will try the ramdisk test you are suggesting and post back the results. Thanks 
for the suggestion!

I have jumbo frames enabled in the switches but windows drivers for the Intel 
network cards don't have the option to enable jumbo frames. I also tried raising 
the MTU in the linux server but performance was even worse (I thought this was 
related to the windows NIC driver not supporting MTUs larger than 1500).

I also modified windows registry to manually enable smb2 protocol because it was 
not negotiating smb2. Do you think of any other optimization that can be done on 
the windows terminals?

Thanks

Juan Pablo



________________________________
From: Stan Hoeppner <stan at hardwarefreak.com>
To: Juan Pablo <jhurcad at yahoo.com>
Cc: samba at lists.samba.org
Sent: Thu, June 2, 2011 8:50:21 AM
Subject: Re: [Samba] Samba performance

On 5/25/2011 10:02 PM, Juan Pablo wrote:

> OS access: 
> Simultaneous read (4 processes):     118 MByte/s average

> Samba local access:
> Simultaneous read (4 processes):     102 MByte/s average

> Samba server from Windows 7:
> Simultaneous read (4 terminals):      70 MByte/s average

The first two results above demonstrate a slow disk subsystem not
suitable for streaming multiple files to multiple concurrent clients at
high data rates.  Your spindles are too slow and/or you don't have
enough to satisfy your test methodology.  Four concurrent dd copies
yields 118 MB/s per process, only ~15% disk headroom above wire speed
GbE.  Your smbd+smbclient local process disk bandwidth overhead appears
to be roughly 13 percent.  I don't know what the optimal percent here
should be but 13% above a dd copy process seems reasonable given the
additional data movement through smbd and smbclient buffers.

It is clear that you don't have enough head seek performance for 4 or
more client streams of 1000 x 8MB files.  This doesn't necessarily
address the 30% drop in over the wire to Win7 client performance, but
we'll get to that later.  To confirm the disk deficiency issue, I
recommend the following test:

Make a 2GB tmpfs ramdisk on the server and run your tests against it,
albeit with 200 instead of 1000 8MB files.  Instructions:
http://prefetch.net/blog/index.php/2006/11/30/creating-a-ramdisk-with-linux/

This will tell you if your server block storage subsystem is part of the
problem, and will give you a maximum throughput per Samba process
baseline.  You should get something like 5GB/s+ local smbclient
throughput from a tmpfs ramdisk on that Xeon platform with its raw
25GB/s memory bandwidth.

Run a single Win7 workstation SMB test copy to a freshly booted machine
so most of the memory is free for buffering the inbound files.  This
will mostly eliminate the slow local disk as a bottleneck.

Now run your 4 concurrent Win7 client test and compare to the single
client test results.  This should tell you if you have a bonding problem
or not, either in the server NICs or the switch.

You didn't mention jumbo frames.  Enable jumbo if not already.  It may help.

Something else to consider is that the kernel shipped with CentOS 5.6,
2.6.18, the "Pirate" kernel, is now 4.5 years old, released in Sept of
2006 (http://kerneltrap.org/node/7144).  There have been just a few
performance enhancements between 2.6.18 and 3.0, specifically to the
network stack. ;)  The CentOS packages are older than dirt as well.  If
you're not wed to CentOS you should look at more recent distros.

-- 
Stan


More information about the samba mailing list