How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)

Ryan Malayter malayter at gmail.com
Wed Jul 15 16:01:08 MDT 2009


On Wed, Jul 15, 2009 at 12:54 PM, Jamie Lokier<jamie at shareable.org> wrote:
> It still has to send the hashes, which can be slow for a large file.
> So it would be even better to cache on the sending side hashes of
> files on the receiving side, perhaps indexed by the receiving side's
> MD5 of the whole file.

The hashes for a 16 GB file using the default block size is about 28
bytes / 128Kbytes. Or 0.02% of the file size, which works out to
around 3.5 MB. This is peanuts in the grand scheme of things when
dealing with large files, so I suppose whichever hash storage location
made the implementation easier or more robust should be used.

If hashes were cached on the receiver, no protocol changes would be
necessary, I think. The hash list would just arrive back at the sender
without any delay.

> There are two meanings of "stateless":
>
>   1. It compares files on the sender and receiver, does not keep a
>      list of what it sent before, so always works even if files on
>      the receiver have been changed without using rsync.
>
>   2. It does not keep auxiliary data such as precomputed hashes to
>      optimize the "stateless" update operation.
>
> Perhaps the rsync maintainers meant 1, and you thought they meant 2?
>

I'm not sure what is truly meant by stateless in this context. "Rsync
is stateless" does seem to be an often-repeated mantra, though:
http://www.google.com/search?q=rsync+stateless+site:lists.samba.org

Unison is often suggested as an alternative, but it really doesn't
handle large files well, and doesn't have --fuzzy. It's also written
in Ocaml, making it even less likely that someone can fix those issues
now that the creators have moved on.

-- 
RPM


More information about the rsync mailing list