FW: rsync performance
jw at pegasys.ws
Sat Sep 13 10:05:09 EST 2003
On Fri, Sep 12, 2003 at 08:35:01AM -0400, Dave Mangelsdorf (CBIZ Tech) wrote:
> Not sure if this is not the proper channel (forum) for this, but I need some
> We have been using rsync in various ways on various platforms.
> Linux-SGI (IRIX)-MacOSX
> In all cases the actual LOCAL file transfer seems to be limited to 10MB/sec
> from disk to disk. Always copy whole files. (no rolling checksums)
> Rsync ?avW
> rsync version 2.5.6 protocol version 26
Rsync is not an efficient local copy utility. It can be
used for local copying but local and high-bandwidth network
speed is sacrificed for low-bandwidth performance and for
data integrity. The only sense in which rsync will be
faster than a normal copy is in its selectiveness of what
files to copy, and in many cases that can be had in ways
other than rsync.
Even with local copy the file checksums are still
calculated. What whole-file on the receiver eliminates one
pass on the baseline file to generate block sums and it
saves disk-disk copy of matched data. Whole-file on the
sender reduces the CPU load of hash lookups for block
> We have tested everything from Small I-mac, to 16 processor SGI server with
> multiple fibre channel interfaces to very large disk arrays. Always tops out
> at 10MB/sec, plus or minus.
Each of these may have different reasons for the
performance limitations. You would need to examine the
system impact to identify the bottleneck.
You mention the 16 CPU SGI server as though massive SMP will
improve performance. It won't. Nor will fibre channel.
With few exceptions disk interfaces have little impact on
disk subsystem performance. The biggest factor is seek time
followed (in varying order) by i/o protocol constraints,
elevator control, RAID scheduling, fragmentation, buss
contention, embedded logic, rotational latency, cylinder
capacity, and one or two others that have slipped my mind at
the moment. As a matter of fact fibre channel arrays will
often perform poorer than other interconnects because their
100MBps half-duplex interconnect suffers from contention of
too many drives JBODed for software RAID.
Since you are comparing cp to rsync on the same system and
disks that won't apply.
As for the 16 CPUs... Rsync will only fork three processes
for the transfer. So more than three CPU's will have little
benefit on a quiescent system. I've also noted that large
SMB systems often have slower CPUs than smaller systems.
The three processes are the generator, the sender and the
receiver. The generator is the process that walks the
receiver's tree comparing it with the sender's file list.
In a local copy all three are forks assuming memory is
copy-on-write. The generator and the receiver share the
In an SMP system rsync is likely to suffer from some SMP
pathologies that can make it slower on SMP than on a
comparable UP system. Rsync is fairly well pipelined so the
three processes will keep each other busy making it
likely that they will be scheduled on separate CPUs. Their
intercommunication is through pipes so the efficiency of the
underlying pipe implementation will be significant. If the
pipe implementation causes lots of cache invalidations or
worse, TBL flushes due to page remapping rsync will suffer.
Remember the copy-on-write? Depending on the precise nature
of the vm system that is likely to cause a good deal of
cache-line bouncing as well.
Finally, we come to the issue of rsync's IO methodology.
Rsync was optimised for low-bandwidth networks and
portability at the expense of IO, CPU, memory. It does not
take advantage of any OS IO performance enhancements.
I'm sorry that you find rsync's local performance
disappointing but that isn't what rsync is really for.
If you do find specific enhancements that can be made that
won't adversely affect portability we'd be glad to hear of
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync