rsync dir in _both_ directions?

David Bolen db3l at fitlinxx.com
Thu Feb 7 08:36:20 EST 2002


Jack McKinney [jackmc-rsync at lorentz.com] writes:

>     If I add 512 bytes at the begining of the file, then I would expect
> it.  If I only add 14 bytes, then I don't think rsync will detect this,
> as it would require it to compute checksums start at EVERY byte instead
> of 512 byte checksums at offsets 0, 512, 1024, 1536, et al.

Yep, and that's precisely what rsync does.  It actually uses two types
of checksums.  One is a fast rolling checksum that can be efficiently
computed with a block starting at _every_ byte in the file.  The nature
of the checksum is that you can compute its new value starting at byte
X+1, based on its old value from a block starting at X by only performing
a single computation based on the new byte at the end of the block
starting at X+1.  But the penalty you pay for the speed is that it's a
"weaker" checksum - you can have inaccurately identified matches 
(e.g., overlaps in the checksum).  So there's a second, much stronger 
checksum, but much slower, that is used to validate a match once the
first checksum thinks it found a match.

When you transmit a file, the sender computes both checksums for each
block in the file it has and sends them over.  The receiver then walks
its current file, taking block size chunks _at every byte_ and
computing the weak/fast checksum.  If the weak matches, it then does
the stronger checksum, and if that matches, it knows it need not
request that block of data from the sender.

This will match common blocks located anywhere within the file at any
offset (including re-using a source block multiple times to reproduce
the target).

You might want to read the tech paper on rsync and its protocol, since
it goes into this in much more detail.  If all rsync did was match on
finite block boundaries, it would be _way_ less useful than it really
is.

>    It is an easy experiment.   (...)
> (...) I suspect that your xfer time will be comparable to
> the first one, not to the second.

Since it's an easy experiment - why "suspect" - did you try this?  It
should take virtually no time for the second (sans the initial
checksum computation and transmission, which to be fair for large
files and small block sizes can be quite significant).

-- David

/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/




More information about the rsync mailing list