silent data corruption with rsync

Pavel Herrmann morpheus.ibis at gmail.com
Thu Mar 13 15:26:01 MDT 2014


Hi

On Thursday 13 of March 2014 20:40:49 devzero at web.de wrote:
> What do "They" recommend instead?
> 
> If it`s all about copying and network bandwidth is not an issue, you can use
> scp or whatever dumb tool which just shuffle the bits around "as is". 
> rsync is being used when you want to keep data in sync and if you want to
> save bandwidth to handle that task. You CAN use it for copying only, but
> you somewhat take a sledgehammer to crack a nut.
> 
> Anyway, if "They" care about their data , "They" use checksumming for
> storing their data on disk, do "They" ? ;)
> 
> The network is not the only place where data corruption can happen....and
> silent bitrot on disks _does_ happen, especially when your harddisks go
> nuts and/or your raid arrays break or your storage controller`s firmware
> got hiccups. It does not happen often, but it happens and mostly you won`t
> know when and where. In my IT job i had one case were some SAN storage lost
> some cache contents and the only place we really knew where data
> loss/curruption has happend were the oracle and exchange databases. For all
> the other data, we don`t know if they are in 100% perfect condition.

probably the most used filesystem for corruption detection/prevention is ZFS, 
which has the option to specify which checksum it is using, and in case of 
dedup it also has a "verify" option (which uses a hash table for looking up 
dupes, but also does a full compare prior to considering a block a dupe. this 
obviously makes no sense for rsync).

I would assume it is easily possible to introduce a more secure hash (like 
sha256) into rsync as a part of new protocol version, and let the user choose.

However, if direct replication of storage is what you are after, I would 
suggest using filesystem snapshots and snapshot-level replication functionality 
of your filesystem/volume manager/SAN instead of rsync

regards
Pavel Herrmann

> 
> regards
> Roland
> 
> >List:       rsync
> >Subject:    silent data corruption with rsync
> >From:       Sig_Pam <spam () itserv ! de>
> >Date:       2014-03-11 16:02:28
> >Message-ID: zarafa.531f3394.439c.5f8c77014439296d () exchange64 ! corp !
> >itserv ! de [Download message RAW]
> >
> >[Attachment #2 (multipart/alternative)]
> >
> >
> >Hi everbody!
> >
> >I'm currently working in a project which has to copy huge amounts of data
> >from one \ storage to another. For a reason I cannot validate any longer,
> >there is a roumor that \ "rsync may silently corrupt data". Personally, I
> >don't believe that.
> >
> >"They" explain it this way: "rsync does an in-stream data deduplication. It
> >creates a \ checksum for each data block to transfer, and if a block with
> >the same checksum has \ already been transferred sooner, this old block
> >will be re-used to save bandwidth. \ But, for any reason, two diffent
> >blocks can produce the same checksum even if the \ source data is not the
> >same, effectively corrupting the data stream".
> >
> >Did you ever hear something like this? Has this been a bug in any early
> >version of \ rsync? If so, when was it fixed?
> >
> >Thank you,
> >
> >Â  sig
> 
> --
> Angaben gemäß §35a GmbH-Gesetz:
> ITServ GmbH
> Sitz der Gesellschaft: 55294 Bodenheim/Rhein
> Eingetragen unter Registernummer HRB 41668 beim Amtsgericht Mainz
> Vertretungsberechtiger Geschäftsführer: Peter Bauer, 55294 Bodenheim
> Umsatzsteuer-ID: DE182270475



More information about the rsync mailing list