MD4 checksum_seed

Tue Mar 16 18:47:21 GMT 2004

Donovan Baarda wrote:

> On Tue, 2004-03-16 at 10:44, Eran Tromer wrote:

> with an 8-byte hash that means you tested approximately  2^64/2 crafted
> fletcher busting blocks to find a collision, yeah?
[snip]
> I thought that without using some sort of known vulnerability the
> formula would be;
>   avg number of random attempts = number of possible sum values / 2

Uhm, no -- just 2^(64/2)=2^32 random blocks, by the Birthday paradox.
Roughly speaking, after looking at N blocks you have N*(N-1)/2 unordered
pairs of observed blocks, and each pair gives a collision with
probability 1^-64, so you need N=~2^32.

> I glanced at Gabriel Nivasch's stuff but I can't see how it could
> have shortcut it so much...

Nivasch's method is provides a (small) constant factor improvement over
other generic collision finding algorithms, such as Floyd's two-finger
method. Nothing special there.

> I guess this assumes that the sum function when applied iteratively to
> itself the way Gabriel Nivasch suggests will walk through the entire
> sumspace before repeating. Does the reduced search space assume some
> sort of distribution of "repeat loops" in the sum algorithm?

Exactly. If you iterate a random function from k bits to k bits, you'll
enter a loop in expected time sqrt(pi/2)*2^(k/2). Once you detected a
loop, you can back up and recover the collision. But note that this
loopy behavior is utilized for reducing intermediate storage; if you
could keep 2^32 hash values in memory and check every sample against all
old ones, you wouldn't need to iterate anything or rely on loops.

> Wouldn't this be highly dependent on the checksum algorithm used?

Indeed, but in a "random" way. More exactly, for a random function the
above analysis applies, so if it things didn't behave like that for MD4
then we would have found a a very interesting and unexpected property of
MD4...

> I was actually thinking "pseudo-random", as in evenly distributed, and
> with no exploitable pattern.... ie still requiring a a exhaustive search
> with no known shortcuts.

I haven't used any shortcuts or exploitable patterns; merely the
determinism, which allows me to find collision a priori and inject them.
Note that each block is hashed independently, so you can't even rely on
randomness of preceding blocks (which wouldn't be a good idea anyway).

>>In adversarial setting the Fletcher checksum is worthless. So to obtain
>>the intended 2^-10 failure probability, rsync should increase its strong
>>checksum by 4 bytes.
>>
>>That's for rsync in normal mode, where checksum_seed is randomized. For
>>fixed checksum_seed (i.e., rsync in batch mode and librsync) the failure
>>probability in adversarial setting is 1, as demonstrated above.
>
> scary.

But easily fixed. The latter, at least.

> Yeah. I've been thinking the same for some time. Because librsync
> separates the signature/delta/patch operations, retransmission is an
> application level decision, but librsync should certainly warn you when
> it fails.

Aye. It would be nice to add (optional) retransmission to rdiff.

> I have seen several variants for the rollsum;
>
> rsync: uses a basic fletcher.

Except that strictly speaking, Fletcher uses one's-complement arithmetics.

> adler32: mod's the s1 and s2 values by the largest 16 bit prime.
> librsync: adds CHAR_OFFSET as described.
> xdelta: uses a pre-calculated "random" mapping from char to ints.
>
> I think adler32 achieves nothing but reducing the checksum domain.

Quoth RFC 1950:
"That 65521 is prime is important to avoid a possible large class of
two-byte errors that leave the check unchanged."
Not much of a difference in adversarial setting, though.

> librsync's offset probably does nothing, but I have a feeling it might
> help a bit for blocksizes less than 256 bytes, because it at least helps
> use more of the checksum domain for the "s1" part. Without it you don't
> end up using the high bits of s1 at all for small blocks.

It doesn't matter. You just shift the small range of possible values
-- instead of using a small range of small numbers you'll use a small
range of large numbers.

Whether two block (having the same length) have the same Fletcher sum is
unaffected by the choice of CHAR_OFFSET. This is all that matters.

> xdelta's random mapping probably does nothing either, but might also
> help use more of the checksum domain for small blocks because it maps
> the chars to ints.

Indeed. Also, xdelta is more resilient to local collisions that consist
of linear changes to the data. For example, in Fletcher you can get a
collision by changing any sequence of 4 characters as follows:
  +1 -1 -1 +1
assuming there is no wrap-around in any of the chars. With random
mappings of char to short you won't have such a short vector of
differences that always yields a collision. More generally, Fletcher is
almost a linear function:
  Fletcher(block1) + Fletcher(block2) = Fletcher(block1+block2)
whenever the character additions don't overflow, which is a bad thing.
xdelta is a bit better in this sense -- it still exhibits a lot of
"linearity", but it's a bit harder to exploit (especially on accident).

You can further improve on the xdelta scheme by choosing the char-to-int
lookup table randomly at each invocation instead of fixing them a
priori, and including the RNG seed along with the signature. Having the
mapping not known in advance pretty much kills linearity in the above
sense. But it's still trivial to find collisions: for example,
  (x y y x) and (y x x y)
will have the same xdelta hash for any x and y and any choice of lookup
table.

> It might be possible to craft a mapping like xdelta's but using a
> special sequence calculated for best effect, much like the polynomials
> in crc32 and crc64.

Possibly, for "natural" or random data (as opposed to adversarial). But
in such contexts, random choice is often close to optimal with high
probability.

> My gut feeling is until someone can demonstrate and/or prove otherwise,
> the simple fletcher checksum would be the best to use.

Of the variants discussed, the ones that make sense are rsync's
Fletcher, Adleman32 and randomized xdelta. Fletcher+offset and plain
xdelta waste speed or strength (respectively) for no good reason.
I would use randomized xdelta, unless there are grave efficiency issues.
Are lookups into L1 cache that expensive?

>>And speaking of the latter, gcc -O3 vs. -O2 gets a speedup by a factor
>>of 8 to 10 in rsync's get_checksum1() and librsync's rs_calc_weak_sum().
> Hmmm. interesting. I didn't think it would make that much difference.
> Have you looked at the librsync rollsum.c implementation in CVS? It
> replaced rs_calc_weak_sum and should be a fair bit faster.

Uhm. Usually loop unrolling is best left for the compiler, and it's
counterproductive with some modern processors. Is rollsum.c faster than
the old implementation with -O3?

  Eran