kmk at sanitarium.net
Fri Mar 8 22:25:31 MST 2013
-----BEGIN PGP SIGNED MESSAGE-----
Ouch. That sounds scary.
The first time I had such a problem I discovered it while trying to
burn a 600+MB avi file to a CDR. I burned a CDR and the md5sum of the
avi file on the disc didn't match. So I burned another one and it
didn't match either. But I still assumed it was the burner as they
tend to have issues so I burned another one. All 3 had different
non-matching md5sums. On a whim I loopback mounted the iso file I was
burning and surprisingly (to me) the avi file within the iso image
also had a different non-matching md5sum. A RAM test revealed bad
RAM. The corruption extended to rsync backups that hadn't complained.
The other time it happened to me I found the problem while checking
md5sums on my rsync backups. A bunch of files didn't match and rsync
wasn't complaining. Again a RAM test revealed bad RAM.
Now the only computer I own that doesn't have ECC RAM is my netbook
and I don't store anything but its OS on there and if I could get a
netbook with ECC RAM I would.
On 03/09/13 00:12, f-rsync at media.mit.edu wrote:
>> Date: Fri, 08 Mar 2013 22:26:24 -0500 From: Kevin Korb
>> <kmk at sanitarium.net>
>> If it were me, based on my previous experience, I would shut down
>> both systems and run memtest86+ or "Windows Memory Diagnostics"
>> on both systems. Make sure to enable the extended tests. Let
>> them run overnight and see if they identify a problem.
> ...but note that "no errors" doesn't mean "RAM good."
> In particular, I had a motherboard once that would corrupt certain
> bit patterns in RAM only when CPU throttling was enabled and the
> CPU had throttled down at the wrong moment. I discovered this
> doing a transfer to/from cryptographic filesystems, so problems at
> the drive or interface level would have corrupted entire blocks,
> which wasn't happening. I discovered it after using dd and nc to
> transfer about 2TB from one machine over the network to another at
> the block level, and, being paranoid, had checksummed both ends
> afterwards---and discovered they didn't match.
> Once I narrowed it down a few problematic files, I did while [ 1 ];
> md5sum some-file; sleep 10; done and watched the output. If I was
> running something CPU-bound in another window, every checksum
> matched. If the machine was idle, then some -didn't- match.
> Whoops. [The solution for that particular machine was to disable
> CPU throttling. Problem solved. Presumably there was some flaky
> timing when the speed of various buses changed.]
> I'd say, if RAM testing turns up nothing, you should try shipping
> a few terabytes of random bits to the far machine and use an nc
> tunnel to redirect them back to the sending machine and compare
> what you get. That may implicate the network hardware, the remote
> machine, whatever, but it would take rsync itself out of the
> picture. Or, if you don't want to set up the reflected tunnel,
> then just take some disk that isn't getting written to (e.g.,
> -dismounted- filesystem), checksum it, and then dd | nc it to the
> remote machine and run that through the same checksum (no need to
> write it to disk there). If they match, then flip the sender &
> receiver and try it again.
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Kevin at FutureQuest.net (work)
Orlando, Florida kmk at sanitarium.net (personal)
Web page: http://www.sanitarium.net/
PGP public key available on web site.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----
More information about the rsync