rsync 1tb+ each day
jw at pegasys.ws
Wed Feb 5 12:34:03 EST 2003
On Tue, Feb 04, 2003 at 11:29:48AM -0800, Kenny Gorman wrote:
> I am rsyncing 1tb of data each day. I am finding in my testing that
> actually removing the target files each day then rsyncing is faster than
> doing a compare of the source->target files then rsyncing over the delta
> blocks. This is because we have a fast link between the two boxes, and
> that are disk is fairly slow. I am finding that the creation of the temp
> file (the 'dot file') is actually the slowest part of the operation.
> This has to be done for each file because the timestamp and at least a
> couple blocks are guaranteed to have changed (oracle files).
As others have mentioned -W (--whole-file) will help here.
The reason the temp-file is so slow is that it is reading
blocks from the disk and writing them to other blocks on the
same disk. This means every block that is unchanged must be
transfered twice over the interface where changed blocks are
only transfered once. If the files are very large this is
guaranteed to cause a seek storm.
Further, all of this happens after the entire file has been
read once to generate the block checksums. Unless your
tree is smallish reads from the checksum pass will have been
flushed from cache by the time you do the final transfer.
--whole-file elminiates most of the disk activity. You no
longer do the block checksum pass and replace the local copying
(read+write) with a simple write from the network.
Most likely your network is faster than the disks. For
files that change but change very little your disk subsystem
would have to be more than triple the speed of your network
for the rsync algorythm (as oposed to the utility) to be of
benefit. If the files change a lot then you merely need
double the speed.
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync