How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)
malayter at gmail.com
Wed Jul 15 16:01:08 MDT 2009
On Wed, Jul 15, 2009 at 12:54 PM, Jamie Lokier<jamie at shareable.org> wrote:
> It still has to send the hashes, which can be slow for a large file.
> So it would be even better to cache on the sending side hashes of
> files on the receiving side, perhaps indexed by the receiving side's
> MD5 of the whole file.
The hashes for a 16 GB file using the default block size is about 28
bytes / 128Kbytes. Or 0.02% of the file size, which works out to
around 3.5 MB. This is peanuts in the grand scheme of things when
dealing with large files, so I suppose whichever hash storage location
made the implementation easier or more robust should be used.
If hashes were cached on the receiver, no protocol changes would be
necessary, I think. The hash list would just arrive back at the sender
without any delay.
> There are two meanings of "stateless":
> 1. It compares files on the sender and receiver, does not keep a
> list of what it sent before, so always works even if files on
> the receiver have been changed without using rsync.
> 2. It does not keep auxiliary data such as precomputed hashes to
> optimize the "stateless" update operation.
> Perhaps the rsync maintainers meant 1, and you thought they meant 2?
I'm not sure what is truly meant by stateless in this context. "Rsync
is stateless" does seem to be an often-repeated mantra, though:
Unison is often suggested as an alternative, but it really doesn't
handle large files well, and doesn't have --fuzzy. It's also written
in Ocaml, making it even less likely that someone can fix those issues
now that the creators have moved on.
More information about the rsync