MD4 checksum_seed

Tue Mar 16 06:17:43 GMT 2004

On Tue, 2004-03-16 at 10:44, Eran Tromer wrote:
> Hi,
> 
> On 2004/03/15 03:49, Donovan Baarda wrote:
[...]
> Just to be sure, I wrote a quick&dirty collision search code for the
> rsync and librsync hashes. I used a generic collision-finding algorithm
> (namely Gabriel Nivasch's multistack variant of the generalized Rho
> method) to search for a truncated MD4 collision among the blocks whose
> Fletcher sum is zero. It took 13ms to find a collision for the default
> parameters of rsync version<2.6.0, with either checksum_seed=0
> (disabled) or checksum_seed=32761 (the value fixed for batch mode). With
> a 4 byte strong hash, which is the most you're likely to see with
> Schultz's code, the above took 1.4 seconds. For librsync with default
> parameters (8-byte strong hash), finding a collision took a couple of
> hours. Here are examples of colliding pairs for each of the above case:
>   http://dl.tromer.org/misc/rsync-collisions.tgz
[...]

with an 8-byte hash that means you tested approximately  2^64/2 crafted
fletcher busting blocks to find a collision, yeah? Did that really take
only a couple of hours? Man, computers are heaps faster now!

By my calculations, even processing them at one block every
micro-second, that would have taken over 272 years. Are you sure? You
must have got _very_ lucky, or there is some serious flaws in md4.
Perhaps there is a serious problem with how we are truncating the md4?

I just checked your librsync sample blocks... you are right. The first 8
bytes of md4sum are the same, the rollsums both evaluate to 0, and yet
the data is different.

How the hell did you do that?

I glanced at Gabriel Nivasch's stuff but I can't see how it could have
shortcut it so much... 

> > The 2^(k/2) attempts assumes random attempts... how does this compare
> > to "crafted to bust fletcher" attempts?
[...]

Just noticed you use 2^(k/2), not (2^K)/2... this must be part of the
reason :-)

I thought that without using some sort of known vulnerability the
formula would be;

  avg number of random attempts = number of possible sum values / 2

Which for a 64 bit sum would be (2^64)/2.

I guess this assumes that the sum function when applied iteratively to
itself the way Gabriel Nivasch suggests will walk through the entire
sumspace before repeating. Does the reduced search space assume some
sort of distribution of "repeat loops" in the sum algorithm? Wouldn't
this be highly dependent on the checksum algorithm used?

> > Does checksum_seed really need to be random for the checksum2 to be
> > random? I know md4 is considered vulnerable compared to md5, but does it
> > have a poor distribution for a fixed seed? If it does, would it make
> > sense to switch to md5 rather than randomise the seed?
> 
> checksum_seed must be random for the checksum2 to be random -- simply,
> because there is no other source of randomness in the hash calculation,

I was actually thinking "pseudo-random", as in evenly distributed, and
with no exploitable pattern.... ie still requiring a a exhaustive search
with no known shortcuts. 

> and if everything is deterministic then you can find collisions in
> advance and put them in the data. This doesn't have anything to do with
> the cryptographic strength of MD4. But if the attacks of [1] can be
> adapted to our scenario then things will become even *worse* than the
> above, and this is a good reason to switch to MD5.
[...]

Hmmm. yeah. The deterministic nature of a fixed seed is a bit of a
problem.

> > You can still use the same formula, just don't count checksum1 in the
> > "checksum size" part of it. Or depending on how much you think it's
> > worth you could count it as x<32 bits worth of checksum, not the full 32
> > bits.
> 
> In adversarial setting the Fletcher checksum is worthless. So to obtain
> the intended 2^-10 failure probability, rsync should increase its strong
> checksum by 4 bytes.
> 
> That's for rsync in normal mode, where checksum_seed is randomized. For
> fixed checksum_seed (i.e., rsync in batch mode and librsync) the failure
> probability in adversarial setting is 1, as demonstrated above.

scary.

> A couple of related notes:
> 
> librsync really ought to include a whole-file hash, like rsync. Maybe
> also an option of retransmitting the file in case of failure (where
> possible). Either way, failure must be reliably detected. Beside
> increasing reliability, in many cases it would allow use of smaller
> block checksums. You can't just entrust integrity checking to a
> subsequent invocation of md5sum or such; re-reading the files is
> expensive and not always possible.

Yeah. I've been thinking the same for some time. Because librsync
separates the signature/delta/patch operations, retransmission is an
application level decision, but librsync should certainly warn you when
it fails.

> The use of CHAR_OFFSET is a pure waste of CPU cycles. It merely amounts
> to adding constants to s1 and s2 (blocklen*CHAR_OFFSET and
> blocklen*(blocklen+1)/2*CHAR_OFFSET, respectively). These depend only on
> the block length, which you know anyway. Indeed rsync mercifully sets
> CHAR_OFFSET=0 which is optimized away, but it still obfuscates the code
> and the relevant comment (rsync.h lines 38-39) is wrong.

I have seen several variants for the rollsum;

rsync: uses a basic fletcher.
adler32: mod's the s1 and s2 values by the largest 16 bit prime.
librsync: adds CHAR_OFFSET as described.
xdelta: uses a pre-calculated "random" mapping from char to ints.

I think adler32 achieves nothing but reducing the checksum domain.

librsync's offset probably does nothing, but I have a feeling it might
help a bit for blocksizes less than 256 bytes, because it at least helps
use more of the checksum domain for the "s1" part. Without it you don't
end up using the high bits of s1 at all for small blocks.

xdelta's random mapping probably does nothing either, but might also
help use more of the checksum domain for small blocks because it maps
the chars to ints.

It might be possible to craft a mapping like xdelta's but using a
special sequence calculated for best effect, much like the polynomials
in crc32 and crc64.

My gut feeling is until someone can demonstrate and/or prove otherwise,
the simple fletcher checksum would be the best to use.

> And speaking of the latter, gcc -O3 vs. -O2 gets a speedup by a factor
> of 8 to 10 in rsync's get_checksum1() and librsync's rs_calc_weak_sum().
> Tested with gcc 3.3.2 on i686 Xeon and Katmai. You may want to make -O3
> the default.

Hmmm. interesting. I didn't think it would make that much difference.
Have you looked at the librsync rollsum.c implementation in CVS? It
replaced rs_calc_weak_sum and should be a fair bit faster.

-- 
Donovan Baarda <abo at minkirri.apana.org.au>
http://minkirri.apana.org.au/~abo/