[clug] forget RAID?

Chris Smart chris at kororaa.org
Thu Feb 22 05:58:20 GMT 2007


Michael Still wrote:
> Actually, the paper found that temperature was not a large factor in
> disk failure:
>
> "Surprisingly, we found that temperature and activity levels were much
> less correlated with drive failures than previously reported."
>   
Is that this quote fro Bianca Schroeder's paper? "They find that while 
temperature and utilization exhibit much less correlation with failures 
than expected, the value of several SMART counters correlate highly with 
failures."
By "they" she refers to "E. Pinheiro, W. D. Weber, and L. A. Barroso. 
Failure trends in a large disk drive population. In /Proc. of the FAST 
'07 Conference on File and Storage Technologies/, 2007."

Two thoughts:
1) Well that doesn't mean that temperature is not a large factor, they 
just said that temperature exhibited _less correlation_ with failures 
then _expected_. But what was their expectation? If it was that the 
number one factor of hard drives dying was heat, then this just means 
they found it's not the only reason but could still a large factor ;)

2) I don't know about their test. Did they actually run multiple hard 
drives _outside_ of the spec'd temperature and see how long it took to 
die in comparison to other hard drives? Or did they just find that 
during their other tests that hard drives died without having gone 
outside the temperature range? In other words, did they deliberately set 
out to test whether drives operating outside the temp spec were more 
likely to die than those operating within the range, or is the above 
just a lose conclusion drawn from the drives that died during their tests?

I know from my experience anyway that often a disk that has died on me 
was running too hot. A Seagate 7200.8 drive is spec'd to run up to 60 
degrees celcius. Run it at 80 degrees + and see how long it lasts ;)

-c


More information about the linux mailing list