Optimizations and other questions for rsync

Craig Barratt craig at atheros.com
Sun Oct 20 22:32:00 EST 2002


> 2. is there a way for rsync to cache previous calculations on checksum...

Rsync doesn't do this, but it is possible.

There is a checksumSeed (typically unix time()) supplied by the server
when rsync starts.  It is different for every run (at least those
started more than 1 second apart).  But since it is appended to the
end of each block it is possible to cache the block checksums (really
the 128 bit MD4 state, prior to MD4_tail) without the checksumSeed,
and simply complete the calculation by adding the checksumSeed and
calling MD4_tail.

However, the entire-file MD4 checksum has the checksumSeed added at
the start (that's a good place to put it to reduce the chance of
MD4 collisions over consecutive runs, but unfortunate for caching).
So you cannot cache the file MD4 checksum: there is no easy way to
compute MD4(checksumSeed, file) even if you know MD4(file).

When checksumSeed == 0 then checksumSeed is not included in MD4 calculations.
So if we added a command-line switch --checksumseed=0 that overrides the
default then all the block and file checksums would be cacheable.

Craig



More information about the rsync mailing list