How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)

Carlos Carvalho carlos at fisica.ufpr.br
Tue Jul 14 16:17:42 MDT 2009


Ryan Malayter (malayter at gmail.com) wrote on 14 July 2009 17:00:
 >On Mon, Jul 13, 2009 at 4:54 PM, Jamie Lokier<jamie at shareable.org> wrote:
 >>
 >> Remembering hashes doesn't make any difference to speed, if the
 >> bottleneck is the sending side.
 >
 >Except that in the rsync pipeline, the reading the destination file to
 >get hashes happens BEFORE the sender reads its file. And the sender
 >calculates hashes and finds matches on-the-fly.
 >
 >So, when transferring a large file, it goes something like this from
 >the sender's perspective:
 >1) sending file list
 >2) receving file list
 >3) file xxxx is different! Recevier, please give me some hashes
 >4) <wait 20+ minutes for receiver to compute hashes> got hashes
 >5) begin transfer, calculating my hashes and compressing on the fly as
 >I transfer
 >6) file complete
 >
 >By caching hashes on the receiving side, the transfer can begin almost
 >instantaneously if the file on the receiver is unchanged since the
 >last run of rsync. This is, in fact, almost always true for the way
 >most people use rsync (backups, file distribution, etc.)
 >
 >Most of my rsync scripts "stall" for minutes doing no effective work,
 >because they are waiting for the destination to read and calculate
 >hashes of a large file that was already hashed yesterday.

Hash calculation is very fast; rsync has a negligible cpu consumption.
What limits it is reading the disk. If you run a hash check you'll see
the process stalled in io and not cpu. Maybe your machine has a
particularly different IO/cpu ratio?

This, and the fact that the maintainer(s?) want to keep rsync stateless,
makes me think that a change to remember hashes is unlikely.


More information about the rsync mailing list