Extremely poor rsync performance on very large files (near 100GB
and larger)
Evan Harris
eharris at puremagic.com
Mon Jan 8 07:37:45 GMT 2007
I've been playing with rsync and very large files approaching and surpassing
100GB, and have found that rsync has excessively very poor performance on
these very large files, and the performance appears to degrade the larger
the file gets.
The problem only appears to happen when the file is being "updated", that
is, when it already exists on the receiving side.
For instance, between two machines with 3Ghz processors, 2gig or more of
ram, and gigabit ethernet, I'm getting about 1.5MB/sec transfer rates from
the following command on an 80GB file:
rsync -avxP -e ssh --inplace test1:/mnt/database/ ./
However, I get around 30MB/sec from the same command using the whole-file
(-W) switch. If the file doesn't exist on the receiving side, I also get
the much more reasonable 30MB/sec rates wether using -W or not.
I haven't tested this without using the --inplace switch (as I don't have
enough free space on the test filesystems to make a full copy of the large
files being rsync'd), but I would presume it behaves similarly.
Another thing of note is that the sending process running on the test1
machine during these poor performance tests appears to be cpu bound
(consuming 99% of cpu time), while during a whole-file rsync of the same
file, the sending rsync process is only utilizing about 10-15% of cpu, and
the ssh process it is spawned through is consuming around 30-35% of cpu. I'm
assuming this is from the encryption overhead of the must faster transfer
rate.
I haven't dug into it very deeply, but maybe it has something to do with the
block size chosen by rsync? Perhaps rsync is picking much too small of a
block size and is continually traversing the list of block hashes? I doubt
it is the hashing of the data blocks themselves, as that would load both
sides of the connection fairly equally, and that is not the case. Is there
another reason that might cause the sending side to be so strangely cpu
intensive?
This was tested with rsync 2.6.8 on both sides. I haven't yet tested with
2.6.9, but after looking at the NEWS file for the 2.6.9 release, it doesn't
look like anything has changed that is likely to affect this issue.
Thanks.
Evan
More information about the rsync
mailing list