MD4 checksum_seed

Wed Mar 10 05:43:01 GMT 2004

Hi,

The following lines in compat.c are rather imprudent:

    if (read_batch || write_batch)
        checksum_seed = 32761;
    else
         checksum_seed = time(NULL);
    write_int(f_out,checksum_seed);

Setting checksum_seed to a constant in batch mode means block collisions
are reproducible and predictable. Thus, some files will be permanently
"unlucky" in batch mode and will always need retransmission. Also, it
means a malicious adversary can perform a performance degradation attack
by injecting a pair of blocks with the same checksum1 and checksum2 (so
that, for example, if you use rsync to backup your website database,
someone could slow down your backups by posting carefully crafted
comments). The latter issue also exists in non-batch mode, since time()
is often predictable. The right thing to do is to always use a really
random value (e.g., use /dev/urandom if available). In batch mode, you
can store the random value somewhere in the batch fileset.

Note that, above, block hash collisions are very easy to find if you
know checksum_seed. The rolling Fletcher checksum1 is trivially
defeated. To defeat the k-bit truncated MD4 checksum2, just keep
generate random blocks having the same checksum1 until you find two with
the same checksum2; by the birthday paradox it will take about 2^(k/2)
attempts, where usually k=16 or k=24 with J.W. Schultz's code.

Another note is that Donovan Baarda's formula for the probability of
retransmission (and thus J.W. Schultz's code) assumes that the hashes
are random. This is a reasonable assumption for the truncated MD4
checksum2 when checksum_seed is random, but it's a pretty rotten
assumption for the Fletcher checksum1. For the purpose of evaluating the
probability of retransmission, rsync uses s2length bytes of good hash
plus 4 bytes of wishful thinking, and Baarda's analysis doesn't really
apply.

  Eran