[Samba] suggestions for a "fast" fileserver - 1G / 10G

Christopher Chan christopher.chan at bradbury.edu.hk
Mon Mar 24 18:40:23 MDT 2014

On Monday, March 24, 2014 11:31 PM, Emmanuel Florac wrote:
> Le Mon, 24 Mar 2014 18:37:34 +0800
> Christopher Chan <christopher.chan at bradbury.edu.hk> écrivait:
>> If you want to talk about performance, please do use ext3 or ext4
>> with a BBU NVRAM block device for the journal and use full journaling
>> mode. Not risky filesystems like XFS.
> Please. I've set up, shipped, installed and supported thousands of file
> servers on XFS, it's time to stop this sort of trolling. I'm a linux
> storage professional not a high-schooler playing with an old PC in the
> attic.

I will take that insult and return it back to you. Storage professional? 
Well, I guess you have the luxury of backup. However, MTA Admins do not 
have such luxury. The stupid way fsync used to behave before io-barriers 
were introduced cause no end of trouble by giving fuel to detractors 
when emails hit the queue and were accepted only to be lost because of 
power issues or a crash. Which filesystem looses the most emails in such 
an environment? XFS. Why? Because its performance is gained from 
aggressive caching in memory. This is not trolling, this is presenting 
the facts. I am not denying its performance. But in some situations, you 
don't want a metadata-only journaling filesystem. So you storage 
professionals with the luxury of backups should just keep your trap shut 
when others who really need to depend on the filesystem and hardware 
combination to not ever lose data point out the deficiencies of 
filesystems in this area.

> Re-read my message. On the same hardware (dual opteron, 35 TB array),
> right now, running 3.12.7 kernel, default mkfs and mount options (+
> nobarrier), XFS is 30% faster than btrfs for single threaded sequential
> reads, even faster for concurrent reads; and 20% faster than ext4 at
> sequential writes, and even better for concurrent writes. It's just
> faster. If what you care about is speed, it's the FS you want.

Correction: if ALL you care about is speed.

> It may be slightly less safe than ext4 with all the slower options
> enabled (data journaling and the like). However for all practical
> purposes it's safe enough, it doesn't eat your data, even when pulling
> the plug (if you have BBUs...). I know it, because it happened on me
> this very morning on a 40 TB server running since 2008.
Slightly less? Tell that to your boss when you lose thousands of emails 
that were in the queue and got 'vaporized' by a power out or crash.

> There have been a troubled period with XFS, but it was something like 8
> or 9 years ago (around the time SGI hit the ground). Didn't you notice
> RH ships XFS nowadays, and employs many of the maintainers and the
> project leader?

I don't care who hires who and what happened in the past when there were 
moving vfs changes that bit everyone except ext3 or the introduction of 
4k stacks. All those are incidental to the design of XFS. It is metadata 
only journaling and employs aggressive caching to achieve its 
performance. It has its place, that is, where speed is important and 
data integrity not so.

>>> 4° filesystem options: if you have a RAID controller with BBU or
>>> like to live dangerously, use nobarrier option or barrier=0.  Don't
>>> use data=ordered (well, if you really want to use ext4 try not to
>>> tie its short legs).
>> I'd prefer to use md and a bbu-nvram card than rely on a hardware
>> raid controller. If you are doing raid5 or raid6 then make sure you
>> have plenty of cache on that controller along with the backup battery.
> For any filesystem bigger than say 10 or 12 TB, you should use RAID-6,
> because disk bit error rates play against you (unrecoverable bit
> error rate is generally one or two orders of magnitude worse than the
> brochure values).

SHOULD? That's storage professional for you. I was going to say that if 
you employed raid5/6, you don't really care about performance especially 
write performance but since that can be mitigated by a nice fat cache 
that is backed by a battery I just let it go.

Anyone running raid5/6 is a elcheapo whether limited by budget or disk 
slots in case. raid1+0 is the way to go on Linux.

> BTW provided you'll probably need to add some SAS ports to your server
> and given the very hefty price of NVRAM cards, I think the hardware
> RAID is a cheaper option.
What's this? Throwing some jargon out are you? This former MTA admin 
runs a box with 24 cheap sata disks (with room for 12 more) that are 
connected to SAS expanders that are in turn connected to a plain LSI9211 
card. But not running raid, oh no, not redundant tech. ZFS takes care of 
these cheap sata disks in a raid1+0 like manner, guards against bit-rot 
and throws in logical volumes to boot. Yeah, I have become an elcheapo too.

> But sure, software RAID is actually faster than hardware RAID, at the
> detriment of some CPU power and memory bandwidth. I'd advise against
> software RAID on application servers for this reason; there is no "one
> size fit all", as always.
md raid1+0 costs nothing in cpu power and bus bandwidth. You only need 
hardware raid for performance in the case of raid5/6 which cuts out bus 
traffic contention suffered by software raid but you will need a big 
enough cache on the card to mitigate the slow write performance.

> All of this is fine and dandy but drifts away from the OP concerns. I'd
> insist on the obvious: you can reach very high network performance even
> with Samba 3.x.x, given that the underlying filesystem responds really
> really fast to small requests. ext3/4 with the default options
> definitely won't cut it. Particularly not ext3 which has no real reason
> to be used anymore.
> I'd even add that network optimisations such as jumbo frames come last.
> They practically do  nothing unless you're running on 10GigE, anyway.

The filesystem is a factor but not the only concern. Either one shores 
up a filesystem's short comings with hardware or one runs risky setups 
like you do. You've chosen your route and I mine. Just don't portray 
yours as the best without pointing out the cons. Like you said: there is 
no 'one size fits all'.

More information about the samba mailing list