How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)

Jamie Lokier jamie at shareable.org
Wed Jul 15 11:54:29 MDT 2009


Ryan Malayter wrote:
> So, when transferring a large file, it goes something like this from
> the sender's perspective:
> 1) sending file list
> 2) receving file list
> 3) file xxxx is different! Recevier, please give me some hashes
> 4) <wait 20+ minutes for receiver to compute hashes> got hashes
> 5) begin transfer, calculating my hashes and compressing on the fly as
> I transfer
> 6) file complete
> 
> By caching hashes on the receiving side, the transfer can begin almost
> instantaneously if the file on the receiver is unchanged since the
> last run of rsync. This is, in fact, almost always true for the way
> most people use rsync (backups, file distribution, etc.)

It still has to send the hashes, which can be slow for a large file.
So it would be even better to cache on the sending side hashes of
files on the receiving side, perhaps indexed by the receiving side's
MD5 of the whole file.

There are two meanings of "stateless":

   1. It compares files on the sender and receiver, does not keep a
      list of what it sent before, so always works even if files on
      the receiver have been changed without using rsync.

   2. It does not keep auxiliary data such as precomputed hashes to
      optimize the "stateless" update operation.

Perhaps the rsync maintainers meant 1, and you thought they meant 2?

-- Jamie


More information about the rsync mailing list