Rsyncing really large files

Thu Feb 24 15:52:17 GMT 2005

On Thu, Feb 24, 2005 at 02:01:59PM +0100, Lars Karlslund wrote:
> As I understand it, the way rsync works is:
> - remote server calculates all checksum blocks and transmits these to client
> - local client calculates all checksum blocks
> - local client compares local blocks to remote blocks, checking if a block
> could have moved
> - changes are synced

Not quite.  There is not a need to pre-calculate all the checksums on
the sending side, though it does put all the checksum data from the
sending side into memory in order to be able to look it up randomly as
it is reading through the file and sending data.

It would certainly be possible to change the algorithm to not cache the
data (and thus only allow the current block to be compared), but I don't
think that idea has general enough interest for me to work on for
inclusion in rsync.  You might want to look into coding it up for
yourself.

However, you should be sure to have measured what is causing the
slowdown first to know how much that will help.  If it is not memory
that is swapping on the sender, it may be that the computing of the
checksums in maxing out your CPU, and removing the caching of the
remote checksums won't buy you as much as you think.  You could use some
of the librsync tools (e.g. rdiff) to calculate how long various actions
take on each system (i.e. try running rdiff on each system outputting to
/dev/null to see how long the computing of the checksums takes).

..wayne..