rsync / checksum small block / xfer small block
Jamie Lokier
jamie at shareable.org
Thu Dec 4 15:32:10 GMT 2008
alexus wrote:
> okay, so you saying if i have large db, and i made a change rsync will
> not re-transfer the whole file, it will just transfer small portion of
> that file? am I correct? does it say something like that in
> documentation anywhere?
In the very first paragraph of the manual:
It is famous for its delta-transfer algorithm, which reduces
the amount of data sent over the network by sending only the
differences between the source files and the existing files in
the destination.
In the README which comes with rsync:
Rsync uses a delta-transfer algorithm which provides a very
fast method for bringing remote files into sync. It does this
by sending just the differences in the files across the link,
without requiring that both sets of files are present at one of
the ends of the link beforehand. At first glance this may seem
impossible because the calculation of diffs between two files
normally requires local access to both files.
In the rsync tutorial:
rsync copies only the diffs of files that have actually changed
Only actual changed pieces of files are transferred, rather
than the whole file. This makes updates faster, especially over
slower links like modems. FTP would transfer the entire file,
even if only one byte changed.
In the rsync algorithm technical report:
The algorithm identifies parts of the source file which are
identical to some part of the destination file, and only sends
those parts which cannot be matched in this way. Effectively,
the algorithm computes a set of differences without having both
files on the same machine.
By the way, the algorithm you describe for "dedupe" - dividing the
file into blocks, checksumming the blocks, and transferring the blocks
which have changed - is described like this in Andrew Tridgell's PhD
on the subject of the rsync algorithm:
This algorithm is very simple and meets some of the aims of our
remote update algorithm, but it is useless in practice. The
problem with it is that A can only find matches that are on
block boundaries. If the file on A is the same as B except that
one byte has been inserted at the start of the file then no
block matches will be found and the algorithm will transfer the
whole file.
Enjoy :-)
-- Jamie
More information about the rsync
mailing list