How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)
Carlos Carvalho
carlos at fisica.ufpr.br
Tue Jul 14 16:17:42 MDT 2009
Ryan Malayter (malayter at gmail.com) wrote on 14 July 2009 17:00:
>On Mon, Jul 13, 2009 at 4:54 PM, Jamie Lokier<jamie at shareable.org> wrote:
>>
>> Remembering hashes doesn't make any difference to speed, if the
>> bottleneck is the sending side.
>
>Except that in the rsync pipeline, the reading the destination file to
>get hashes happens BEFORE the sender reads its file. And the sender
>calculates hashes and finds matches on-the-fly.
>
>So, when transferring a large file, it goes something like this from
>the sender's perspective:
>1) sending file list
>2) receving file list
>3) file xxxx is different! Recevier, please give me some hashes
>4) <wait 20+ minutes for receiver to compute hashes> got hashes
>5) begin transfer, calculating my hashes and compressing on the fly as
>I transfer
>6) file complete
>
>By caching hashes on the receiving side, the transfer can begin almost
>instantaneously if the file on the receiver is unchanged since the
>last run of rsync. This is, in fact, almost always true for the way
>most people use rsync (backups, file distribution, etc.)
>
>Most of my rsync scripts "stall" for minutes doing no effective work,
>because they are waiting for the destination to read and calculate
>hashes of a large file that was already hashed yesterday.
Hash calculation is very fast; rsync has a negligible cpu consumption.
What limits it is reading the disk. If you run a hash check you'll see
the process stalled in io and not cpu. Maybe your machine has a
particularly different IO/cpu ratio?
This, and the fact that the maintainer(s?) want to keep rsync stateless,
makes me think that a change to remember hashes is unlikely.
More information about the rsync
mailing list