[clug] forget RAID?
Alex Satrapa
grail at goldweb.com.au
Thu Feb 22 06:05:45 GMT 2007
On 22/02/2007, at 16:25 , Michael Still wrote:
> Actually, the paper found that temperature was not a large factor in
> disk failure:
>
> "Surprisingly, we found that temperature and activity levels were much
> less correlated with drive failures than previously reported."
The more useful finding of the paper being that drive failure is a
gradually rising curve, plotting failure rate over time in service.
Or as TFA about the paper summarised - "Drives get old, fast". This
is contrary to the belief held by many people that hard drives
experience failure rates in a "bathtub curve" — that is, high initial
failures, a period of relative calm, then a sudden rise in old-age
death. The paper found that the failure rate started off small, and
grew larger faster with advancing age.
The corollary being that if one drive fails in a reasonably large
array, you will most likely be seeing another failure very soon.
I'll suggest that it doesn't matter if your RAID array supports
transparent rebuilds when you swap out a broken disk for a brand new
blank one — if the array will last three years before the first disk
breaks (or the controller fries itself), you'll be replacing it with
a bigger better one anyway.
Alex
More information about the linux
mailing list