rsync 1tb+ each day

Craig Barratt craig at atheros.com
Wed Feb 5 23:41:28 EST 2003


> I am rsyncing 1tb of data each day.  I am finding in my testing that 
> actually removing the target files each day then rsyncing is faster than 
> doing a compare of the source->target files then rsyncing over the delta 
> blocks.  This is because we have a fast link between the two boxes, and 
> that are disk is fairly slow. I am finding that the creation of the temp 
> file (the 'dot file') is actually the slowest part of the operation. 
> This has to be done for each file because the timestamp and at least a 
> couple blocks are guaranteed to have changed (oracle files).

How big are the individual files?  If they are bigger than 1-2GB then it
is possible rsync is failing on the first pass and repeating the file.
You should be able to see from the output of -vv (you will see a
message like "redoing fileName (nnn)").

The reason for this is that the first-pass block checksum (32 bits Adler
+ 16 bits of MD4) is too small for large files.  There was a long thread
about this a few months ago.  The first message was from Terry Reed
around mid Oct 2002 ("Problem with checksum failing on large files").

In any case, as your already note, if the network is fast and the disk
is slow then copying the files will be faster.  Rsync on the receiving
side reads each file 1-2 times and writes each file once, while copying
just requires a write on the receiving side.

Another comment: rsync doesn't buffer its writes, so each write
is a block (as little as 700 bytes, or up to 16K for big files).
Buffering the writes might help.  There is an optional buffering
patch (patches/craigb-perf.diff) included with rsync 2.5.6 that
improves the write buffering, plus other I/O buffering.  That
might improve the write performance, althought so far significant
improvements have only been seen on cygwin.

Craig


More information about the rsync mailing list