[librsync-devel] librsync and rsync vulnerability to maliciously crafted data. was Re: MD4 checksum_seed

Eran Tromer rsync2eran at tromer.org
Thu Apr 8 10:48:15 GMT 2004

On 2004/04/08 08:50, Donovan Baarda wrote:
>>In some cases you might prefer to actually store an signed signature
>>using something like GPG.

I think librsync should act as a black box that guarantees file
integrity (which, apparently, requires a whole file checksum). If
someone wants to add authentication or encryption or whatever on top of
that, all the merrier, but let that be done in addition to rsync's own
integrity check.

That means both librsync and (say) GPG would compute a whole file hash.
The communication cost of this duplication is negligible (a typical hash
is much smaller than a typical digital signature). The computation is a
bit more annoying -- it will roughly double the CPU consumption of the
receiver. That can be solved that by revealing the whole file checksum
(when available) via the API, so that librsync users can directly sign
that checksum instead of computing their own.

> if the
> checksum_seed is based on something like the whole file md4sum, it
> becomes repeatable, but unpredictable. You can't manipulate individual
> blocks without it affecting every other blocksum, but the signature for
> the same file is always the same. Nice :-)

Nice indeed, but the cost is enormous: you'll have to read the file
twice. When syncing a mostly-unchanged file that's larger than the disk
cache, that means doubling the runtime (and disk load) on the receiver's
side. Also, it means 'rdiff signature' and equivalents won't work on
arbitrary streams. Currently the receiver can tee the output of 'rdiff
patch' into both the output file and an instance of 'rdiff signature',
in order to save the IO of re-reading the file upon the next sync; this
won't work anymore. (How about a built-in option for that, BTW?).

> More thoughts on using the wholefile sum as the seed; at first I thought
> this would still be vulnerable to case 3). Using a any single block as
> the original file and trying to find any block that matches means you
> still have "birthday algorithm" numbers of blocks to check (ie 2^(n/2)).
> However, each block "comparison" requires the recalculation of the
> "target" blocksum using the "original" blocksum as the seed, resulting
> in non-birthday algorithm number of checksum calculations (ie, 2^n).

I'm afraid it's still vulnerable to case 3 (a pair of "target" and
"original" files with matching blocks). For simplicity consider
single-block files. In this case what you've done is simply to replace
the hash function
  f(x) = truncate(MD4(x,fixed_seed))
with the hash function
  f'(x) = truncate(MD4(x,MD4(x,fixed_seed)))
which happens to be the same as hashing two copies of x in sequence.
But the birthday attack doesn't care what's the hash function; it only
cares about its input and output sizes (which we didn't change) and that
the function is "sufficiently random".


More information about the rsync mailing list