[librsync-devel] librsync and rsync vulnerability to maliciously crafted data. was Re: MD4 checksum_seed

Thu Apr 8 11:16:24 GMT 2004

G'day again,

From: "Eran Tromer" <rsync2eran at tromer.org>
[...]
> > if the
> > checksum_seed is based on something like the whole file md4sum, it
> > becomes repeatable, but unpredictable. You can't manipulate individual
> > blocks without it affecting every other blocksum, but the signature for
> > the same file is always the same. Nice :-)
>
> Nice indeed, but the cost is enormous: you'll have to read the file
> twice. When syncing a mostly-unchanged file that's larger than the disk
> cache, that means doubling the runtime (and disk load) on the receiver's
> side. Also, it means 'rdiff signature' and equivalents won't work on

But the vast majority of the load is in the delta calculation on the
sender's side.

> arbitrary streams. Currently the receiver can tee the output of 'rdiff
> patch' into both the output file and an instance of 'rdiff signature',
> in order to save the IO of re-reading the file upon the next sync; this
> won't work anymore. (How about a built-in option for that, BTW?).

True... that is a nice feature... you are slowly turning me off the
whole-file checksum as seed :-)

> > More thoughts on using the wholefile sum as the seed; at first I thought
> > this would still be vulnerable to case 3). Using a any single block as
> > the original file and trying to find any block that matches means you
> > still have "birthday algorithm" numbers of blocks to check (ie 2^(n/2)).
> > However, each block "comparison" requires the recalculation of the
> > "target" blocksum using the "original" blocksum as the seed, resulting
> > in non-birthday algorithm number of checksum calculations (ie, 2^n).
>
> I'm afraid it's still vulnerable to case 3 (a pair of "target" and
> "original" files with matching blocks). For simplicity consider
> single-block files. In this case what you've done is simply to replace
> the hash function
>   f(x) = truncate(MD4(x,fixed_seed))
> with the hash function
>   f'(x) = truncate(MD4(x,MD4(x,fixed_seed)))

Not quite... it's f(x,y) = truncate(MD4(x,MD4(y,fixed_seed))), where x and y
are the two blocks to be compared. This means you have to re-calculate the
hash for every compare, not just once for every block.

> which happens to be the same as hashing two copies of x in sequence.
> But the birthday attack doesn't care what's the hash function; it only
> cares about its input and output sizes (which we didn't change) and that
> the function is "sufficiently random".

It also assumes that the hash is done once per sample input, not once per
compare. Sure, you still only need to try 2^(n/2) blocks, but you need to
calculate 2^n hashes, and that's what really hurts.

----------------------------------------------------------------
Donovan Baarda                http://minkirri.apana.org.au/~abo/
----------------------------------------------------------------