[Samba] suggestions for a "fast" fileserver - 1G / 10G - focus on smb.conf/samba

Linda W samba at tlinx.org
Tue Mar 25 15:54:29 MDT 2014

Bringing this paragraph first as it may make information below, "moot".

ISCSI  is a replacing-type technology in terms of file serving.

It usually allows you to attach a remote SCSI device to a client such that
the client sees it as a local disk.   If the client sees it as local, 
why would
one use samba...

If your servers are using iscsi and then re-exporting them to clients
via samba... um...that will be very likely to cause a performance hit.

FTP and HTTP will do better than samba because they will be
streaming large streams of data  -- so a server-in-the-middle could
do a good job of prefetching to keep the outgoing streams
fed.  But Samba is more likely to have lots of little reads
and I don't think samba explicitly uses any prefetch instructions
to the OS. 

Other comments below may not apply in your use case...
>>> We have some reliable, fast hardware SATA ISCSI raid boxes with min.
>>> 12 disks, most are 16 disks, configured in raid 10, 5, 6 depending
>>> on the use case and date stored. (e.g. lots of smaller r/w, some are
>>> big files min. some Gig.)
gcarter at aesgi.com wrote:
> Secondly, don't use the binary RPM's with the distro.  They are generic
> architecturally speaking.
> Use the latest GCC stack, and use XEON instruction optimizations to compile and
> build the binary and kernel.
    I think you mean Core2/newer Xeon; there's also a P4/older Netburst 
based Xeon.

> Secondly, I would implement block device caching.
???   How would you NOT implement block device caching?
I.e. how would you turn it off?

The block cache is a standard feature of Linux kernels.  If you are
referring to the new, 'filesystem caching' as part of the general
filesystem local caching manager (FSCACHE in .config) -- that's designed
for use on a client to better cache remote content.  For a server,
it will already cache file system blocks, making effective use of
the memory when it can (right now 'free' shows I have about 52GB
of file system data cached in memory w/the standard, always on,
block-cache, so I find your suggestion to implement a block
cache, confusing.

> For directed I/O and Network I/O I got a good 20% performance boost all around
> just doing nothing with my existing Samba config and implementing the above.
    Directed I/O??  Do you mean 'direct I/O' which avoids the local 
block cache on
a client?  What other type of I/O do you have besides Network I/O w/a samba
> Also, make sure you have a good spread on your APIC.
> If you see all of your interrupts going through processor 0, your operating
> system kernel is not using the hardware efficiently.
> Make sure your APIC is working, and you got good interrupt load spread across
> the PCIXpress backplane/bus between your Network card, Memory, and SATA/SCSI
> controllers.
What interrupts are associated with Memory?   Spread doesn't necessarily
mean 'efficient'.  It depends on specific hardware.  Example:
If I forced network interrupts to specific processors (cores), and then
forced samba (smbd) to it's own processor -- different from the network
interrupts, I measured a drop in performance since it was guaranteed,
that the data being processed by smbd would never be i the processor's local
cache.  At 1Gb, it probably wouldn't be noticed, but at 10Gb, it is.

Unfortunately, you really want to route interrupts from the network
by 'flow' or 'stream' for each tcp session.  With 300 users, spreading
things out might very well help, but for a single user, doing so
may be more likely to hurt.

> Building a professional SAMBA system goes way beyond the smb.conf file,
> unfortunately.
On that, I would agree.  But the advice you are giving isn't generic and
may vary even based on HW manufacturer and model.

As an example -- for a 10Gbit card look in the file
<linux-src>/Documentation/networking/ixgb.txt (it's for a PCI-X
card, the equiv doc for the PCI-E card (which is a more common
bus these days) doesn't have as explicit examples).  So I look
at most of the info, but not stuff related to PCI-X.

Option suggestions for net.ipv4.tcp_sack would hold for both
1Gb & 10Gb cards... similarly, while you might not need
as much tcp_[rw]mem for 1Gb, having that much wouldn't
hurt if you have the memory.

The defaults are too low for BOTH 1Gb and 10Gb cards.

It talks about specific settings to change in /proc.

Way too much to cover here, goto google
and search on 1Gb network tuning (you'll see
articles dating as far back as 2007.

If you search on 10Gb tuning, you'll see advice more
recent advice, but some of the limits might be more
suitable for 10 v. 1 Gb, but the newer docs will be
more likely to mention newer kernel settings that the
older docs wouldn't have.

Hit your network stack tuning (which can be done mostly
in /proc) BEFORE thinking about rebuilding your kernel.

...  Google is your friend for finding sources for info on
network perf tuning...

More information about the samba mailing list