Weak CheckSum Question

Stephan Buys s.buys at icon.co.za
Fri Mar 15 08:15:22 EST 2002


Hi all,

I am writing a xdelta-like application as a personal experiment and am busy 
implementing the rsync protocol, so far so good. I am using C++ templates and 
creating the algorithms so that operate on any stream, array, etc. through 
iterators.

All seems well except that I am getting a lot of false hits with the weak 
checksum. When generating checksums of blocksize 1024 on the RedHat 7.1 ISO I 
generate about 760 000 checksums which go into a hash_multimap.

When running the rolling checksums on the RedHat 7.2 ISO (against the 
checksums in the hash_map) I am getting almost 95% false hits. (ie the 
weak-checksums match, but when comparing the offsets the comparison fails).

Is this expected? Is there a stronger rolling checksum I could implement? It 
takes about 80seconds to generate all the checksums on a 650MB file, thus I 
could certainly use a stronger algorithm which cause less false hits.

Any ideas would be greatly appreciated.

Regards,
Stephan Buys

PS please reply directly to me as I am not on the mailing list.





More information about the rsync mailing list