[librsync-devel] librsync and rsync vulnerability to maliciously crafted data. was Re: MD4 checksum_seed

Thu Apr 8 12:37:47 GMT 2004

Ahoy,

On 2004/04/08 14:16, Donovan Baarda wrote:
>>Nice indeed, but the cost is enormous: you'll have to read the file
>>twice. When syncing a mostly-unchanged file that's larger than the disk
>>cache, that means doubling the runtime (and disk load) on the receiver's
>>side. Also, it means 'rdiff signature' and equivalents won't work on
> 
> But the vast majority of the load is in the delta calculation on the
> sender's side.

My experience is that when you sync a mostly unchanged large files on
modern PCs, both sides are IO-bound. The delta calculation just rolls
along at top speed due to the "try the next block first" heuristic.

>>I'm afraid it's still vulnerable to case 3 (a pair of "target" and
>>"original" files with matching blocks). For simplicity consider
>>single-block files. In this case what you've done is simply to replace
>>the hash function
>>  f(x) = truncate(MD4(x,fixed_seed))
>>with the hash function
>>  f'(x) = truncate(MD4(x,MD4(x,fixed_seed)))
> 
> Not quite... it's f(x,y) = truncate(MD4(x,MD4(y,fixed_seed))), where x and y
> are the two blocks to be compared. This means you have to re-calculate the
> hash for every compare, not just once for every block.

Indeed, you're right.
More fundamentally, every time you compute f(x,y) you win iff
f(x,y)==f(y,y), otherwise you don't learn anything interesting. So
you'll have to compute f about 2^n times. Yes, this looks secure when
the hash function is perfectly random. The only reservation is that
using the same user-affected seed to hash many user-determined blocks is
uncomfortably reminiscent of MD4's known weaknesses.

Still, are there reasons beyond the aesthetic to want deterministic
signature generation? The costs in IO and flexibility seem very high.

  Eran