rsync / checksum small block / xfer small block

Jamie Lokier jamie at shareable.org
Thu Dec 4 15:32:10 GMT 2008


alexus wrote:
> okay, so you saying if i have large db, and i made a change rsync will
> not re-transfer the whole file, it will just transfer small portion of
> that file? am I correct? does it say something like that in
> documentation anywhere?

In the very first paragraph of the manual:

       It is famous for its delta-transfer algorithm, which reduces
       the amount of data sent over the network by sending only the
       differences between the source files and the existing files in
       the destination.

In the README which comes with rsync:

       Rsync uses a delta-transfer algorithm which provides a very
       fast method for bringing remote files into sync.  It does this
       by sending just the differences in the files across the link,
       without requiring that both sets of files are present at one of
       the ends of the link beforehand.  At first glance this may seem
       impossible because the calculation of diffs between two files
       normally requires local access to both files.

In the rsync tutorial:

       rsync copies only the diffs of files that have actually changed
       Only actual changed pieces of files are transferred, rather
       than the whole file. This makes updates faster, especially over
       slower links like modems. FTP would transfer the entire file,
       even if only one byte changed.

In the rsync algorithm technical report:

       The algorithm identifies parts of the source file which are
       identical to some part of the destination file, and only sends
       those parts which cannot be matched in this way. Effectively,
       the algorithm computes a set of differences without having both
       files on the same machine.

By the way, the algorithm you describe for "dedupe" - dividing the
file into blocks, checksumming the blocks, and transferring the blocks
which have changed - is described like this in Andrew Tridgell's PhD
on the subject of the rsync algorithm:

       This algorithm is very simple and meets some of the aims of our
       remote update algorithm, but it is useless in practice.  The
       problem with it is that A can only find matches that are on
       block boundaries.  If the file on A is the same as B except that
       one byte has been inserted at the start of the file then no
       block matches will be found and the algorithm will transfer the
       whole file.

Enjoy :-)
-- Jamie


More information about the rsync mailing list