[clug] forget RAID?
Chris Smart
chris at kororaa.org
Thu Feb 22 05:58:20 GMT 2007
Michael Still wrote:
> Actually, the paper found that temperature was not a large factor in
> disk failure:
>
> "Surprisingly, we found that temperature and activity levels were much
> less correlated with drive failures than previously reported."
>
Is that this quote fro Bianca Schroeder's paper? "They find that while
temperature and utilization exhibit much less correlation with failures
than expected, the value of several SMART counters correlate highly with
failures."
By "they" she refers to "E. Pinheiro, W. D. Weber, and L. A. Barroso.
Failure trends in a large disk drive population. In /Proc. of the FAST
'07 Conference on File and Storage Technologies/, 2007."
Two thoughts:
1) Well that doesn't mean that temperature is not a large factor, they
just said that temperature exhibited _less correlation_ with failures
then _expected_. But what was their expectation? If it was that the
number one factor of hard drives dying was heat, then this just means
they found it's not the only reason but could still a large factor ;)
2) I don't know about their test. Did they actually run multiple hard
drives _outside_ of the spec'd temperature and see how long it took to
die in comparison to other hard drives? Or did they just find that
during their other tests that hard drives died without having gone
outside the temperature range? In other words, did they deliberately set
out to test whether drives operating outside the temp spec were more
likely to die than those operating within the range, or is the above
just a lose conclusion drawn from the drives that died during their tests?
I know from my experience anyway that often a disk that has died on me
was running too hot. A Seagate 7200.8 drive is spec'd to run up to 60
degrees celcius. Run it at 80 degrees + and see how long it lasts ;)
-c
More information about the linux
mailing list