Problem with checksum failing on large files

Donovan Baarda abo at
Sun Oct 13 03:08:00 EST 2002

On Sat, Oct 12, 2002 at 07:29:36PM -0700, jw schultz wrote:
> On Sat, Oct 12, 2002 at 11:13:50AM -0700, Derek Simkowiak wrote:
> > > My theory is that this is expected behavior given the check sum size.
> > 
> >      Craig,
> > 	Excellent analysis!
> > 
> > 	Assuming your hypothesis is correct, I like the adaptive checksum
> > idea.  But how much extra processor overhead is there with a larger
> > checksum bit size?  Is it worth the extra code and testing to use an
> > adaptive algorithm?
> > 
> > 	I'd be more inclined to say "This ain't the 90's anymore", realize
> > that overall filesizes have increased (MP3, MS-Office, CD-R .iso, and DV)
> > and that people are moving from dialup to DSL/Cable, and then make either
> > the default (a) initial checksum size, or (b) block size, a bit larger.
> I lean toward making the block-size adaptive.  Perhaps
> something on the order of 700 < (filesize / 15000) < 8KB
> Maybe both checksum size and block-size should be adaptive
> so that both track the file size so large files have larger
> but fewer checksums (compared to current defaults) and small
> files retain their current advantages.  

The ideal checksum size is totaly dependant on the number of blocks. If you
scale your blocksize so that the number of blocks is kept low enough, you
can stick with the fixed checksum size.
> Any change like this will require a protocol bump so should
> include the MD4 sum corrections as well.

does it really need a protocol bump? The protocol already supports adaptive
block sizes, so provided the default block size increases at a suitable rate
based on the file size, that's all that is needed.

It would be nice to also force a larger checksum size, but that probably
would require a protocol change... (I'm more familiar with librsync than
rsync so I could be wrong).

ABO: finger abo at for more info, including pgp key

More information about the rsync mailing list