MD4 checksum_seed

Donovan Baarda abo at
Mon Mar 15 01:49:12 GMT 2004

On Wed, 2004-03-10 at 16:43, Eran Tromer wrote:
> Note that, above, block hash collisions are very easy to find if you
> know checksum_seed. The rolling Fletcher checksum1 is trivially
> defeated. To defeat the k-bit truncated MD4 checksum2, just keep
> generate random blocks having the same checksum1 until you find two with
> the same checksum2; by the birthday paradox it will take about 2^(k/2)
> attempts, where usually k=16 or k=24 with J.W. Schultz's code.

I haven't really thought much about carefully crafted data to bust
things. I didn't think the Fletcher checksum was that bad. Although I
can see that it would be fairly trivial to craft data for collisions,
wouldn't it be harder to craft data for a Fletcher collision _and_
attempt to produce a checksum2 collision? The 2^(k/2) attempts assumes
random attempts... how does this compare to "crafted to bust fletcher"

> Another note is that Donovan Baarda's formula for the probability of
> retransmission (and thus J.W. Schultz's code) assumes that the hashes
> are random. This is a reasonable assumption for the truncated MD4
> checksum2 when checksum_seed is random, but it's a pretty rotten

Does checksum_seed really need to be random for the checksum2 to be
random? I know md4 is considered vulnerable compared to md5, but does it
have a poor distribution for a fixed seed? If it does, would it make
sense to switch to md5 rather than randomise the seed?

> assumption for the Fletcher checksum1. For the purpose of evaluating the
> probability of retransmission, rsync uses s2length bytes of good hash
> plus 4 bytes of wishful thinking, and Baarda's analysis doesn't really
> apply.

You can still use the same formula, just don't count checksum1 in the
"checksum size" part of it. Or depending on how much you think it's
worth you could count it as x<32 bits worth of checksum, not the full 32

Donovan Baarda <abo at>

More information about the rsync mailing list