file corruption

Kevin Korb kmk at sanitarium.net
Fri Mar 8 22:25:31 MST 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ouch.  That sounds scary.

The first time I had such a problem I discovered it while trying to
burn a 600+MB avi file to a CDR.  I burned a CDR and the md5sum of the
avi file on the disc didn't match.  So I burned another one and it
didn't match either.  But I still assumed it was the burner as they
tend to have issues so I burned another one.  All 3 had different
non-matching md5sums.  On a whim I loopback mounted the iso file I was
burning and surprisingly (to me) the avi file within the iso image
also had a different non-matching md5sum.  A RAM test revealed bad
RAM.  The corruption extended to rsync backups that hadn't complained.

The other time it happened to me I found the problem while checking
md5sums on my rsync backups.  A bunch of files didn't match and rsync
wasn't complaining.  Again a RAM test revealed bad RAM.

Now the only computer I own that doesn't have ECC RAM is my netbook
and I don't store anything but its OS on there and if I could get a
netbook with ECC RAM I would.

On 03/09/13 00:12, f-rsync at media.mit.edu wrote:
>> Date: Fri, 08 Mar 2013 22:26:24 -0500 From: Kevin Korb
>> <kmk at sanitarium.net>
> 
>> If it were me, based on my previous experience, I would shut down
>> both systems and run memtest86+ or "Windows Memory Diagnostics"
>> on both systems.  Make sure to enable the extended tests.  Let
>> them run overnight and see if they identify a problem.
> 
> ...but note that "no errors" doesn't mean "RAM good."
> 
> In particular, I had a motherboard once that would corrupt certain
> bit patterns in RAM only when CPU throttling was enabled and the
> CPU had throttled down at the wrong moment.  I discovered this
> doing a transfer to/from cryptographic filesystems, so problems at
> the drive or interface level would have corrupted entire blocks,
> which wasn't happening.  I discovered it after using dd and nc to
> transfer about 2TB from one machine over the network to another at
> the block level, and, being paranoid, had checksummed both ends
> afterwards---and discovered they didn't match.
> 
> Once I narrowed it down a few problematic files, I did while [ 1 ];
> md5sum some-file; sleep 10; done and watched the output.  If I was
> running something CPU-bound in another window, every checksum
> matched.  If the machine was idle, then some -didn't- match.
> Whoops.  [The solution for that particular machine was to disable
> CPU throttling.  Problem solved.  Presumably there was some flaky
> timing when the speed of various buses changed.]
> 
> I'd say, if RAM testing turns up nothing, you should try shipping
> a few terabytes of random bits to the far machine and use an nc
> tunnel to redirect them back to the sending machine and compare
> what you get. That may implicate the network hardware, the remote
> machine, whatever, but it would take rsync itself out of the
> picture.  Or, if you don't want to set up the reflected tunnel,
> then just take some disk that isn't getting written to (e.g.,
> -dismounted- filesystem), checksum it, and then dd | nc it to the
> remote machine and run that through the same checksum (no need to
> write it to disk there).  If they match, then flip the sender &
> receiver and try it again.
> 

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
	Kevin Korb			Phone:    (407) 252-6853
	Systems Administrator		Internet:
	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
	Orlando, Florida		kmk at sanitarium.net (personal)
	Web page:			http://www.sanitarium.net/
	PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlE6x8sACgkQVKC1jlbQAQeGeQCeMwaiM9qO8FOpH2CGZga2Si/w
888AoO2u7mpX5XNFphwdlUGhhFEdfb27
=fm8K
-----END PGP SIGNATURE-----


More information about the rsync mailing list