kmk at sanitarium.net
Fri Mar 8 20:26:24 MST 2013
-----BEGIN PGP SIGNED MESSAGE-----
On 03/08/13 22:16, xiaolong mou wrote:
> Thanks. I suspected hardware issue as well. However I did a local
> rsync test at the USB drive side and NAS side with the same files.
> If there was a local hardware problem (RAM or USB hard drive driver
> etc) it should show up in this test, but everything was fine!
I would think this would hit a bad RAM problem but it wouldn't if the
bad RAM was in the other system and there is no guarantee either way.
> Could it be network driver somehow corrupting data? Although it is
> hard to believe. I have used the NAS attached to the router for
> years, never noticed any network issue. Also the rsync checksum
> after transfer should catch this case, right?
I suppose it is possible. Rsync only checks that the file it
assembles in memory+cache matches what the other system was sending.
It assumes that the write to disk will also match.
> I did the original test (transferring data from NAS to router back
> to NAS) using rsync over ssh. This time I did see several warning
> messages, saying "xxxx failed verification -- update discarded
> (will try again)". The diff was fine, apparently the retry worked.
This could still be bad RAM.
> I am fairly puzzled. Hopefully someone can help shed some light to
> track it down.
If it were me, based on my previous experience, I would shut down both
systems and run memtest86+ or "Windows Memory Diagnostics" on both
systems. Make sure to enable the extended tests. Let them run
overnight and see if they identify a problem.
> On Fri, Mar 8, 2013 at 9:09 PM, Kevin Korb <kmk at sanitarium.net>
> wrote: I have seen this behavior before. Twice.
> Both times the cause was bad RAM on the target system. The bad
> RAM was corrupting the files within the disk write cache so that
> rsync believed it was writing the correct data but the disk was not
> getting the correct data. Ever since that happened to me I have
> insisted on ECC RAM and have never had such a problem again.
> It is also possible that a disk or disk controller could be at
> Rsync does do an in memory checksum before it renames the file
> into place but it does not re-read the file back from the disk
> (which in reality would also require a dump of the cache).
> On 03/08/13 20:55, xiaolong mou wrote:
>>>> I am backing up about 500G of data from a linux-based NAS to
>>>> a USB hard drive attached to a router (rt-n16 with tomatousb
>>>> firmware). Both NAS and the router have rsync-3.0.9. The
>>>> router is running rsync in the daemon mode. To test the set
>>>> up, I rsync'd the files to the empty USB hard drive, then
>>>> back to the NAS in a new temp dir. Afterwards, I did a local
>>>> "diff -r" on the NAS. To my surprise, a few files were
>>>> corrupted (only 3 out of 17K files). "cmp -l" shows single
>>>> byte difference between the original and rsync'd files.
>>>> However, I didn't see any error message on the NAS side or in
>>>> the rsyncd logs. How is that possible? Doesn't rsync always
>>>> do a checksum verification after copying the files? I have
>>>> been using rsync for years for local backup. The feeling of
>>>> silent file corruption is scary. Could someone point me to
>>>> the right direction? I really want to get to the bottom of
>>>> this. Much appreciated.
>>>> Regards, Xiaolong
>> -- Please use reply-all for most replies to avoid omitting the
>> mailing list. To unsubscribe or change options:
>> https://lists.samba.org/mailman/listinfo/rsync Before posting,
>> read: http://www.catb.org/~esr/faqs/smart-questions.html
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Kevin at FutureQuest.net (work)
Orlando, Florida kmk at sanitarium.net (personal)
Web page: http://www.sanitarium.net/
PGP public key available on web site.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----
More information about the rsync