copying efficiency (was: Data corruption)

Wayne Davison wayned at samba.org
Mon Sep 5 17:18:57 GMT 2005


On Tue, Aug 30, 2005 at 09:48:21AM -0400, Linus Hicks wrote:
> Okay, so one more thing. We have some machines that still have 100mbit
> NICs and I have seen files that have only a few modified datablocks
> take less time to rsync than to rcp (like a fraction of a second).

Yes, that is what rsync is good at -- using CPU cycles and disk I/O to
optimize away network I/O.  You'll note I said that --no-whole-file
slows down a *local* copy -- i.e. one that is not going over a network
to a second machine.  This is because all the CPU resources and all the
disk I/O resources are coming from one machine, and the "network" is a
pipe that has faster throughput than the disks, so it doesn't need to
be optimized.

> 1. Allow me to set a threshold by percentage or number of blocks which
> once exceeded causes rsync to switch to --whole-file mode.

This is not possible.  The whole process is pipe-lined, and by the
time the sender is looking at the checksum data from the generator,
the generator has already finished its work on that file (and probably
several others too).  For a local transfer, --whole-file is almost
always the best choice.  For a remote transfer, using --whole-file is
only a good choice when the network is very fast (where the disk I/O
is becoming the bottleneck instead of the network I/O), or when the
CPUs are overly slow.

> 2. Make --dry-run report accurate statistics on how many blocks are 
> different.

In order to compute that, rsync would need to do all the hard work of
the transfer (reading both files and throwing lots of CPU at the
checksums and block-search processing), so by that time you might as
well have it complete the work and update the files -- it wouldn't save
you anything to go to all that work and then throw it away to copy the
files via some other means.

..wayne..


More information about the rsync mailing list