How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)

Tony Abernethy tony at servacorp.com
Tue Jul 14 16:55:23 MDT 2009


Carlos Carvalho wrote:
> 
> Ryan Malayter (malayter at gmail.com) wrote on 14 July 2009 17:00:
>  >On Mon, Jul 13, 2009 at 4:54 PM, Jamie 
> Lokier<jamie at shareable.org> wrote:
>  >>
>  >> Remembering hashes doesn't make any difference to speed, if the
>  >> bottleneck is the sending side.
>  >
>  >Except that in the rsync pipeline, the reading the 
> destination file to
>  >get hashes happens BEFORE the sender reads its file. And the sender
>  >calculates hashes and finds matches on-the-fly.
>  >
>  >So, when transferring a large file, it goes something like this from
>  >the sender's perspective:
>  >1) sending file list
>  >2) receving file list
>  >3) file xxxx is different! Recevier, please give me some hashes
>  >4) <wait 20+ minutes for receiver to compute hashes> got hashes
>  >5) begin transfer, calculating my hashes and compressing on 
> the fly as
>  >I transfer
>  >6) file complete
>  >
>  >By caching hashes on the receiving side, the transfer can 
> begin almost
>  >instantaneously if the file on the receiver is unchanged since the
>  >last run of rsync. This is, in fact, almost always true for the way
>  >most people use rsync (backups, file distribution, etc.)
>  >
>  >Most of my rsync scripts "stall" for minutes doing no 
> effective work,
>  >because they are waiting for the destination to read and calculate
>  >hashes of a large file that was already hashed yesterday.
> 
> Hash calculation is very fast; rsync has a negligible cpu consumption.
> What limits it is reading the disk. If you run a hash check you'll see
> the process stalled in io and not cpu. Maybe your machine has a
> particularly different IO/cpu ratio?
> 
> This, and the fact that the maintainer(s?) want to keep rsync 
> stateless,
> makes me think that a change to remember hashes is unlikely.

You also have the consequences of several of these being run on top of
each other. Life is much better when that does not destroy everything.
( There are people (like me) who do things like that ;-)

Just guessing, but the target probably has to read a lot of disk before
it finds something different, and the sending process stalls until it
gets something that it can send. Seems like the -P may be informative.



More information about the rsync mailing list