rsync speedup - how ?

Fri Aug 14 08:48:31 MDT 2009

On Thu, Aug 06, 2009 at 08:15:39PM +0200, devzero at web.de wrote:
> i read that rsync would be not very efficient with ultra-large files
> (i`m syncing files with up to 80gb size)

Things to try:

- Be sure you're using rsync 3.x, as it has a better hash algorithm for
  the large numbers of checksum blocks that need be scanned on the
  sending side.

- The --inplace option might help, since it can reduce the amount of
  write I/O when the file is being modified (though it does reduce the
  amount of backward matching).  In a really large file where most of
  the data stays the same, this could be a big win.

- Try setting the --block-size option.  This will only help if the
  block size is so large it is missing finding matching data.  In a huge
  file that is mostly unchanged, this may not be an issue.  Note that
  decreasing the block size increases the amount of checksum data, and
  the amount of blocks in the matching algorithm.

- The best things you could do would be to mount the virtual drives
  (source read-only, dest read/write) and copy within the file systems.
  That would allow rsync to use its size+mtime fast-check to skip most
  of the files.  It would not, however, result in truly identical disk
  images, so may not be a solution for you.

Keep in mind that the checksumming as it currently works requires the
receiving side to read the whole file (sending its checksums), then
(after that is done) the sending side reads the whole file (generating
differences), which allows the receiving side to reconstruct the file
while the sender is sending in the changes.  Sadly, this means that the
transfer serializes this file-reading time (since the sender wants to be
able to find moved blocks from anywhere in the file).

An interesting new option might be one that tells the sender to
immediately start comparing the received checksums to the source file,
and only check if the data matches (with no movement) or if it needs to
send the changed data (i.e. this would skip scanning for moved data).
For mostly unchanged, large files, that would allow concurrent reading
of the receiving and sending files.  Combined with --inplace, this might
be a pretty large speedup for mostly-unchanged files.

..wayne..