How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)

Ryan Malayter malayter at gmail.com
Tue Jul 14 16:00:36 MDT 2009


On Mon, Jul 13, 2009 at 4:54 PM, Jamie Lokier<jamie at shareable.org> wrote:
>
> Remembering hashes doesn't make any difference to speed, if the
> bottleneck is the sending side.

Except that in the rsync pipeline, the reading the destination file to
get hashes happens BEFORE the sender reads its file. And the sender
calculates hashes and finds matches on-the-fly.

So, when transferring a large file, it goes something like this from
the sender's perspective:
1) sending file list
2) receving file list
3) file xxxx is different! Recevier, please give me some hashes
4) <wait 20+ minutes for receiver to compute hashes> got hashes
5) begin transfer, calculating my hashes and compressing on the fly as
I transfer
6) file complete

By caching hashes on the receiving side, the transfer can begin almost
instantaneously if the file on the receiver is unchanged since the
last run of rsync. This is, in fact, almost always true for the way
most people use rsync (backups, file distribution, etc.)

Most of my rsync scripts "stall" for minutes doing no effective work,
because they are waiting for the destination to read and calculate
hashes of a large file that was already hashed yesterday.

Incidentally, hashes could also be "remembered" on the sending side as
well, and sent to the receiver. You would of course fail back to the
current behavior if the file had changed on both ends, or if somehow a
whole-file checksum failed.

-- 
RPM


More information about the rsync mailing list