rsync to servers highly sensitive to IO load

Fri Jul 11 14:10:16 GMT 2008

We would like to use rsync to deploy large data files over an (all to
often) faulty WAN connection.  In the past, we've used scp and, when a
transfer got interrupted, it would have to be restarted from scratch.
This is why we've begun experimenting with rsync.

The problem we're having with rsync stems from its use of checksums.
The target systems we're deploying data to are highly sensitive to IO
loads.  When an rsync resumes, the performance of the systems are
considerably degraded for a period of several minutes while the
checksum runs, driving up IO wait and invalidating the system disk
cache.

In order to throttle the load on the target systems, we're currently
using the following options:

    --bwlimit=17 --partial --append

--bwlimit is the primary mechanism for limiting the IO, --partial
resumes failed transfers (primary reason for using rsync), and
--append prevents the computation of the initial checksum when
resuming a failed transfer.

Given this background information, I have two questions:

1.  In using --append, I understand the final post-transfer checksum
is still computed.  Would it be better to take the hit up front and
compute the checksum on the partial chunk (which is smaller than the
whole file...)?  This would be the best choice if rsync maintains a
running checksum as data is transferred, negating the need to re-read
the entire file for a post-transfer checksum).  From the docs, it's
not entirely clear how this works.

2.  Are there other options I could/should be using to help in this
specific application?

Thanks,
Jeff
-------------- next part --------------
HTML attachment scrubbed and removed