Data corruption

Tue Aug 30 13:48:21 GMT 2005

Wayne Davison wrote:
> On Mon, Aug 29, 2005 at 02:24:08PM -0400, Linus Hicks wrote:
> 
>>Mainly, it was apparently defaulting to using whole-file mode
> 
> 
> If you're doing a local copy, --whole-file mode is *much* faster.  Using
> --no-whole-file doubles your disk I/O, which is only a good thing if
> your transfer is limited by network I/O.

Okay, so one more thing. We have some machines that still have 100mbit NICs and 
I have seen files that have only a few modified datablocks take less time to 
rsync than to rcp (like a fraction of a second). I can get around 40mb/sec reads 
locally on each machine while I can only get about 11mb/sec on the network. So 
it would be useful to me if there were some way to detect how much of a file is 
different.

I tried using --dry-run with --no-whole-file and --inplace but it immediately 
reports that it would transfer the entire file. I would like to see one or both 
of the following enhancements:

1. Allow me to set a threshold by percentage or number of blocks which once 
exceeded causes rsync to switch to --whole-file mode. This would be most 
effective if the copy could proceed from its current point in the file without 
having to go back to the beginning.

2. Make --dry-run report accurate statistics on how many blocks are different. I 
could then feed this information into my script which can figure out whether to 
rsync or rcp.

Any hope on these?

Linus