[clug] forget RAID?

Alex Satrapa grail at goldweb.com.au
Thu Feb 22 06:05:45 GMT 2007


On 22/02/2007, at 16:25 , Michael Still wrote:

> Actually, the paper found that temperature was not a large factor in
> disk failure:
>
> "Surprisingly, we found that temperature and activity levels were much
> less correlated with drive failures than previously reported."

The more useful finding of the paper being that drive failure is a  
gradually rising curve, plotting failure rate over time in service.  
Or as TFA about the paper summarised - "Drives get old, fast". This  
is contrary to the belief held by many people that hard drives  
experience failure rates in a "bathtub curve" — that is, high initial  
failures, a period of relative calm, then a sudden rise in old-age  
death. The paper found that the failure rate started off small, and  
grew larger faster with advancing age.

The corollary being that if one drive fails in a reasonably large  
array, you will most likely be seeing another failure very soon.

I'll suggest that it doesn't matter if your RAID array supports  
transparent rebuilds when you swap out a broken disk for a brand new  
blank one — if the array will last three years before the first disk  
breaks (or the controller fries itself), you'll be replacing it with  
a bigger better one anyway.

Alex



More information about the linux mailing list