How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)

Ryan Malayter malayter at gmail.com
Mon Jul 13 13:18:38 MDT 2009


Your log file indicates that rsync is indeed working as designed
finding lots of data matches:

   Literal data: 123736377 bytes
   Matched data: 17889663500 bytes

This means that rsync only had to transfer 118 MB instead of 16+ GB.
It does this by trading CPU and disk operations for network bytes.

Now, it could be that rsync is still slow in terms of "wall clock"
time. This can happen if the process gets bottlenecked by something
other than the network (CPU, memory, or disk). Rsync does need to read
the existing files on the destination side in their entirety to
compute checksums. This can make the overall process disk-bound.

Rsync also needs to search for hash matches on the sending side, which
is CPU-intensive.

If your network is fast, you disk is slow-ish, and your files are
really big, it can be faster to simply transfer the whole file (with
compression). This is in fact the case for one of my rsync jobs... it
is faster in terms of wall-clock time to send the whole file without
checking for data matches (the -W parameter). But we still use rsync,
since we pay per GB for transfer from the colocation facility.

Rsync really helps in lower bandwidth or situations where you pay per
byte for bandwidth, and wall-clock transfer speed isn't always the
most important thing.

It would be a big boost for large files if rsync "remembered" the
hashes on each end, so it didn't have to re-read the files on every
run if the files were unchanged. This is a feature that rsync's
developers have rejected, since rsync is designed to be stateless
between runs. I believe Unison does keep state at both ends, you might
want to look at that.

-- 
RPM


More information about the rsync mailing list