Problem with checksum failing on large files

Mon Oct 14 20:23:00 EST 2002

> > Would you mind trying the following?  Build a new rsync (on both
> > sides, of course) with the initial csum_length set to, say 4,
> > instead of 2?  You will need to change it in two places in
> > checksum.c; an untested patch is below.  Note that this test
> > version is not compatible with standard rsync, so be sure to
> > remove the executables once you try them.
> > 
> > Craig
> 
> 
> I changed csum_length=2 to csum_length=4 in checksum.c & this time rsync
> worked on the first pass for a 2.7 GB file.  

Cool!

> I'm assuming that this change forced rsync to use a longer checksum length
> on the first pass, what checksum was actually used?

Yes.  It's now using adler32 + first 4 bytes of MD4 (64 bits total)
for each block in the first pass, instead of adler32 + first 2 bytes
of MD4 (48 bits total).  With just two more bytes, the chance of first
pass failure for random files of size 2.3GB with 700 byte block goes
from more than 99% to 0.04%.

This is in addition to the earlier problem: the chance of two different
blocks of the old file having the same checksum goes from a couple of
percent to vanishingly small.

I agree with the earlier comments: checksum size is the key variable.
Block size is secondary.

Craig