Rsync performance with very large files

Carlos Carvalho carlos at fisica.ufpr.br
Fri Jan 8 16:07:05 MST 2010


Eric Cron (ericcron at yahoo.com) wrote on 8 January 2010 12:20:
 >We're having a performance issue when attempting to rsync a very large file. Transfer rate is only 1.5MB/sec.  My issue looks very similar to this one: 
 > 
 >http://www.mail-archive.com/rsync@lists.samba.org/msg17812.html  
 > 
 >In that thread, a  'dynamic_hash.diff' patch was developed to work around this issue. I applied the 'dynamic_hash' patch included in the 2.6.7 src, but it didn't help.   

That's what I'd expect.

 >We are trying to evaluate the possibility of using rsync as an
 >alternative to IBM's FlashCopy, which only works within the storage
 >pool controlled by our San Volume Controller.
 >
 >Some details about our test environment:
 > 
 >- Sender and Receiver are both POWER6 servers running AIX 5.3 
 >- Fiber attached disk, DS8300  storage 
 >- Gigabit network (Hypervisor Virtual I/O) 
 >- Test file is 232GB 
 >- I've tried rsync version 3.0.7 (vanilla) and 2.6.7 with the dynamic_hash.diff patch, both compiled with IBM's xlc compiler.  
 >Same behavior with both versions.

Yes. v3 has better hashing but it's rarely the bottleneck.

 >- It takes approx 1.5 hours to 'consider' the file before transfers begin, no big deal... 

Reasonable. It's likely not "considering", it's reading the file on
the destination. At a rate of 40MB/s it takes about 1.5h to read
232GB.

 >- Once the changes are being sent, the rate is only 1.5MB/sec 

Likely limited by the origin reading the file, if there are few
changes.

rsync is designed to reduce net traffic, and this usually costs more
local I/O. The destination machine first reads the entire file and
sends checksums to the origin, which (only) then reads the entire file
and (meanwhile) sends the differences to the destination. So the total
time is at least destination-reading + source-reading. In your case
you have a net that is about as fast as local I/O. If the destination
can write roughly as fast as the origin can read, you're better off
just copying the entire file. This will save you about 40%-50% in
total time, since you then do the destination and source operations in
parallel.

You can speed up rsync with --whole-file, which will do exactly the
above.


More information about the rsync mailing list