[Samba] Re: File Systems - Which one to use?

Dragan Krnic dkrnic at lycos.com
Mon Dec 16 10:07:00 GMT 2002

> I personally prefer dump|restore pipeline. It never goes
> much below the theoretical throughput capacity (about 11.5
> MB/s is what I get) and never any swapping. If I dump to 
> "/dev/null" than the throughput is way beyond that,
> although I can't pull the precise figure off the top of my
> head right now (I'll check it up and report in a future
> installment if necessary).

I'm making good on the promise. Here are the test results.
They are not as rigourous as they could be but rather more
exhaustive than the usual rant:

   # bdf /data2
   /dev/hda3 on /data2 type ext3 (rw,acl,user_xattr)
       used 13,669,332 KB, available 40,352,944 KB, 26% full
   # time dump -0uab 64 -f /dev/null /dev/hda3
   real 11m59.216s, user 0m13.670, sys  1m1.280s

That's 19,462,020 bytes per second on a 60 GB IBM IC35L060AVER07-0

> However, using "tar -b 64" I got the same transfer rates
> from reiser to ext3 - the limiting factor is fast ether not
> fs. I'll see what I get when I redirect to "/dev/null" and
> post it later on.

Another promise fulfilled:

   # cd /local
   # bdf .
   /dev/hdb1 on /local type reiserfs (rw,acl,user_xattr)
       used 12,679,580 KB, available 16,614,016 KB, 44% full
   # time tar -cb 64 -f /dev/null .
   real  3m14.635s, user 0m4.320s, sys 0m16.600s

That's 66,708,916 bytes per second on a 40 GB Maxtor 53073H6.

Let's recheck the ext3 with tar:

   # cd /data2
   # bdf .
   /dev/hda3 on /data2 type ext3 (rw,acl,user_xattr)
       used 13,669,332 KB, available 40,352,944, KB 26% full
   # time dump -0uab 64 -f /dev/null /dev/hda3
   real  9m55.023s, user 0m4.480, sys  0m15.860s

in other words 23,524,126 bytes per second. So it must be the
fs that is unnecessarily inefficient.

Both IDE disks being rated at 7.2 Krpm, it is a surprising
comparison for me too. Just in order to keep things in
perspective, here's another dump of an 18 GB SCSI/LVD disk, 
an IBM DNES-318350W, also 7.2 Krpm, on the same system:

   # bdf /
   /dev/sda3 on / type ext3 (rw,data=ordered,acl)
       used  3,697,568 KB, available 12,611,624 KB, 23% full
   # time dump -0uab 64 -f /dev/null /dev/sda3
   real 7m45,027, user 0m4.650s, sys 0m19.510s

In terms of speed paltry 7,211,647 bytes per second. (I must
keep the SCSI disk for the system because the IDE DMA part
of the i845 chipset freezes the system with a kernel panic
every now and then reliably.)

It's a further proof that there's something funny about ext3,
because if we do a serial read test, we get:

   # time dd if=/dev/hda3 bs=1024k count=1024 of=/dev/null
   real 0m33.346s, user 0m0.000s, sys 0m4.460s

32,200,019 bytes per second for IBM IDE (ext3)

   # time dd if=/dev/hdb1 bs=1024k count=1024 of=/dev/null
   real 0m45.818s, user 0m0.000s, sys 0m4.170s

23,434,934 bytes per second for Maxtor (reiser)

   # time dd if=/dev/sda3 bs=1024k count=1024 of=/dev/null
   real 0m56.250s, user 0m0.010s, sys 0m4.460s

19,088,744 bytes per second for SCSI/LVD IBM (ext3)

It's getting curiouser and curiouser! The IBM's IDE disk with 
ext3 is in fact almost 50% faster than the Maxtor's IDE disk
and still it's outperformed by it by a factor of 2.8 using
tar on both systems. To really evaluate this data I should
turn it around and put reiser on IBM's IDE and ext3 on the
Maxtor's disk. Since I'm too lazy I'll just unfairly
extrapolate the figures: in that case ext3 should deliver
17,120,684 bytes per second and reiser would beat it by a
factor of 5.35 (assuming the IBM's IDE can indeed yield the
projected 91,659,245 bytes per second).

The IBM's SCSI disk is clearly a lousy oldie - I've had much
better experiences with 15 Krpm disks.

I've been worried like hell that there might have been an
error in the measurements. Especially the tar on reiser was
suspect - I mean how can you tar so much faster than you
can dd? Simple! The dd utility needs a face-lift. It performs
miserably for other people who use it to reblock data too.

The next suspicion was that tar on reiser might be skipping
something, so I added a pipe to "wc" which showed that tar
output 13,100,711,936 bytes, about 3% more than shown by "bdf",
not enough to spoil the computation (tar has 512 byte headers,
reiser squeezes file tails and is generally more parsimonious
in resource wasting than any other system I know).

What surprised me was that the pipeline took 23m41.319s
(user 2m3.730s, sys 1m24,52s). On more mature comercial *nices
this pipe hardly ever makes any difference. Shouldn't someone
look into it?

> Now, the lack of a decent "dump/restore" for reiser is
> another sore fact that might stand in the way of faster
> acceptance of this fine fs. I'm all sold on the paradigm
> and believe that the authors will make it even better
> (read faster) in the coming version 4.

Dump didn't exactly pass with flying colors in this test. It
was a lot slower (20%) accessing the raw volume than tar
which jumped through all the fs hoops of a blocked device.

I'll take a closer look at GPL/dump. The results are too
paradoxical in my view. My own vxdump for Veritas vxfs
(sorry not yet GPLed, not freely available but an excellent
alternative for seriously minded - I mean the vxfs - although
with reiserfs 4.0 this might change) easily dumps in excess
of 50 MB/s to an LTO tape from 80 MB/s SCSI/LVD software
striped mirrored disks.

I wish more people would pay closer attention to measurable
differences in disks and fs's. The tests took what, 35 minutes,
annotating the results another hour - everyone can sacrifice
so much for computer sciences. I invite everyone to check my
timings in the fine tradition of peer review.

Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for $19.95/year.

More information about the samba mailing list