[clug] EXT4 Reliability

Daniel Pittman daniel at rimspace.net
Tue Sep 29 19:00:03 MDT 2009


Anshul Gupta <email.agupta at gmail.com> writes:
> On 30/09/2009, at 12:24 AM, Daniel Pittman wrote:
>> Anshul Gupta <email.agupta at gmail.com> writes:
>>
>>> Performance wise ext4 filesystem is better because of extents and
>>> persistent preallocation which improves performance slightly. Somebody
>>> might argue it's better for movie files because of contiguous allocation
>>> and will wear the disk less thereby reduce the risk of disk failure.
>>
>> Anybody who argued that would, rightly, be looked at very strangely; the
>> difference that makes in practice is zero, for all practical purposes.
>>
>> Extents do make access to large data less seek-heavy, though, which can be
>> a significant performance advantage.
>
> I guess for the difference to be visible you have to simulate simultaneous
> write of several files using CFQ on ext3 and ext4 file system and see how
> contiguous large files look on ext4 using some disk visualizer.

Ah.  I was unclear in my writing: I meant to say that anyone who argued that
the filesystem "...will wear the disk less" would be looked at oddly.

You are absolutely correct, and I was terribly unclear, that extents make a
huge, and valuable, difference.  I probably shouldn't be allowed to write
technical posts at half past twelve or something. ;)

[...]

>>> However I prefer ext3 just because it has been around for many years and
>>> it's rock solid. Ext4 is still new. Also ext4 filesystem are not as
>>> reliable due to delayed allocation.
>>
>> That is a poorly supported statement.  I presume you mean that ext4 has a
>> higher risk of data loss in the event of a catastrophic system failure,
>> when faced with incompetently written software.
>
> Not because of the software but because of ext4 code. Under ext3 you will
> have either the old version of the file or new; using ext4 you might have
> none in the above situation unless the app forces fsync() after writes.

Both filesystems suffer the same flaws; ext3 simply forces a global journal
commit, hence an effective fsync, every five seconds; ext4 doesn't.  This
reduces the window, but it doesn't remove the problem.

It is true that in practice you are more likely to see issues with ext4, or
XFS, or JFS, than with ext3, but this is an artifact of the global sync.  You
could get much closer to the same reliability by simulating ext3:

    while sleep 5; do sync; done   # pretend we are ext3

[...]

>>> That being said, for your important data you should always use some sort of
>>> RAID.
>>
>> This doesn't protect you from several of the failure modes of ext4, and can
>> make things significantly worse by exposing you to additional complexity,
>> and additional sources of failure.
>
> I was merely suggesting Ian to use redundant disks and not rely entirely on
> single copy for important data.

Well, for many failure modes I would argue that any RAID *is* a single copy.
Yes, it can help reduce hardware level problems, but it can introduce others.

> You are assuming that I am referring to software RAID.

Nope.  Hardware RAID suffers the same, or potentially worse, issues of risk.
Adding a battery-backed cache module to the hardware RAID card can help, but
there are inherent issues with RAID that /can/ make it a worse choice for
reliability than plain disk.

> Even using software RAID additional complexity is worth the protection.

Most times, yes, as long as you don't push it close to the edge.

A few examples, based on common real world issues:

If you have a RAID1 with any number of disks, and a power failure during a
write, then you are in a situation where you have some disks with a new
version of a block, and some with an old version.

You can mitigate that by electing the most common block as the accepted
version, by serializing writes in a predictable fashion, or by keeping a
journal of your write activity.

The first risks data-level corruption, because the most common block might be
the older block, but elevator reordering of blocks might have written "later"
blocks first.

The later two reduce performance, and cost in terms of storage, so they are
not all that popular.  (If you have battery-backed cache then you didn't have
a power failure, obviously. :)


If you have a RAID5, with a degraded disk, and a power failure, you had two
concurrent failures, taking you past the safety limits of the device.  That
can cause undetected data corruption, or just a device that fails to assemble
without manual intervention.  Plus, the above problem is also present, plus
you more often rewrite blocks thanks to the need to correct parity.

[...]

> FAT is good as multi-os filesystem but not very good with large files and
> certainly the last option when you are choosing Linux native filesystems. I
> have used FAT as backup filesystem for backup in the past and I won't
> recommend it to anyone. Inefficient access, file size limitation,
> fragmentation, file corruption, locking etc are just a few issues.

*nod*  We were more or less on the same page then.  Cool.  I always wonder
what other problems lurk in the heart of filesystems, which I am not yet
familiar with.  There always seems to be another one waiting.

Regards,
        Daniel

-- 
✣ Daniel Pittman            ✉ daniel at rimspace.net            ☎ +61 401 155 707
               ♽ made with 100 percent post-consumer electrons
   Looking for work?  Love Perl?  In Melbourne, Australia?  We are hiring.


More information about the linux mailing list