[clug] Anyone using Seagate SMR dives [6GB & 8GB, 3.5" 'Archive']??
csirac2 at gmail.com
Thu Nov 12 05:34:23 UTC 2015
I followed this thread on linux-btrfs, whereupon usage of the SMR
drives resulted in errors (inconsistent data):
this post roughly outlines some (seemingly doable) changes for btrfs
to support "host-managed" SMR drives directly:
... however this would appear to be due to a linux kernel regression
(not btrfs specific).
The use-case for raid might end up being a bit silly for these drives,
but as a destination for long-term archival of incremental btrfs
snapshots it might prove useful.
On 12 November 2015 at 16:04, steve jenkin <sjenkin at canb.auug.org.au> wrote:
>> On 12 Nov 2015, at 2:52 PM, Bob Edwards <bob at cs.anu.edu.au> wrote:
>> I am not sure the review says "Can’t use in RAID arrays”.
> BTW, while Seagate say the Power On duty cycle is 100% (always on, 8760hrs/year), they have load/unload cycles of only 300 [power-on cycles].
> The “gotcha” is they’re only rated for 180TB of write (IIRC) per year…
> You don’t want to be using them for R/W in an active filesystem.
> OK for Read-only copy of your data with occasional updates.
> Subtle why they’re ’Not for RAID’: there’s two reasons.
> 1. They’re “SMR” - Shingled Magnetic Writes [or Regions].
> They get the 1.33TB/disk by writing very wide tracks, then on the next revolution, stepping the heads much less than the track-width (half its width) and deliberately overwriting (half) the already written track.
> Works well when reading or doing sequential writes & appending.
> You can’t do a seek and _update_ like on a ‘standard’ drive. You’ve got to read, then rewrite, the entire ‘region’. Or append to the end of a region.
> [No idea of how big the regions are, or how to set their size.]
> The drives ‘self manage’ and present a normal “update in-place” interface, but that’s a lie.
> 2. One review I read said the drives cache around 20GB of data (seems like a Log-structured write) and then at their leisure, read & rewrite any updated regions.
> My guess is they deliberately make very small SMR regions for that 20GB, so it’s effectively “update-in-place”.
> Alternatively, it is Log-structured data, it can always append the new value of any updated block and flip a ‘dirty’ bit in the DRAM block-map.
> So the ‘managed’ interface gives you a pretend ‘update in-place’ that normal O/S’s can use - at the expense of both inconsistent and unpredictable performance and potentially an Early Drive Death. [Not that the vendor would mind you prematurely wearing out your drive. You may have a different opinion.]
> The Read performance, seeking to unmodified data, will be consistent and ‘fast’.
> The sequential write performance, especially appending to the end of the last written region, will be consistent and fast, but not as fast as read,.
> [Review confirms 2MB sequential and random read/write were close to normal ‘Enterprise’ drive performance]
> The random write performance due to the interaction of the 20GB buffer and SMR region-rewriting, will be _woeful_ because when the drive fills that Log / Buffer, it will stop writing until it’s cleared the backlog…
> The review I read had the peak performance of random 4Kb writes as good, but then it’d go to 0Kb/sec for extended periods.
> IIRC, throughput was ~9.5MB/sec for random 4K writes. Can’t recall MB/sec for sequential writes.
> As soon as you put these SMR drives with big buffers in a RAID system that wants to constantly update parity-blocks as well as Data blocks, it’ll run like treacle. (RAID-5 does 2 reads and 2 writes per block update if not ‘performance-optimised’ which can be more prone to losing data, RAID-6 does 3 & 3 read/writes.)
> The storage review people did a two-drive RAID-1 (mirrored) rebuild with Enterprise Drives vs SMR’s
> - ~20 hrs for Hitachi Helium 6TB drives to remirror
> - 57 hrs for the Seagate SMR’s.
> Rebuilding even a 5- or 7-drive RAID-5 array of SMR’s, even if taken off-line, will be multiples of that single-drive time as _every_ drive has to be scanned from start to finish. Good news is that if you’re off-line, the spare drive may be written sequentially, but this test showed a 1:1 resync of a broken mirror behaved as if it was all random read/writes. Worst case for SMR.
> From the review:
>> The HGST He8 HDDs completed its rebuild in 19 hours and 46 minutes.
>> The Seagate Archive HDDs completed their rebuild in 57 hours and 13 minutes.
>> Needless to say in a larger RAID group or with background activity taking place, that rebuild time will only get longer.
>> At this time Seagate recommends single drive deployments, be it consumer or enterprise.
>> For hyper-scale deployments that are SMR aware, specially designed software can be used to replicate data across multiple drives in a fashion that won't have the RAID rebuild penalty in a drive failure scenario.
> Steve Jenkin, IT Systems and Design
> 0412 786 915 (+61 412 786 915)
> PO Box 48, Kippax ACT 2615, AUSTRALIA
> mailto:sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin
> linux mailing list
> linux at lists.samba.org
More information about the linux