librsync and rsync vulnerability to maliciously crafted data. was Re: MD4 checksum_seed

Donovan Baarda abo at
Mon Apr 5 05:21:32 GMT 2004

G'day again,

Just revisiting an old thread after some more thought. Eran and I were
discussing the vulerability of librsync and rsync to deliberate attempts
to craft blocks with matching signatures but different content. It turns
out it's disturbingly easy. Here's a bit of context;

From: "Donovan Baarda" <abo at>
> From: "Eran Tromer" <rsync2eran at>
> > Donovan Baarda wrote:
> [...]
> > > I thought that without using some sort of known vulnerability the
> > > formula would be;
> > >   avg number of random attempts = number of possible sum values / 2
> >
> > Uhm, no -- just 2^(64/2)=2^32 random blocks, by the Birthday paradox.
> > Roughly speaking, after looking at N blocks you have N*(N-1)/2 unordered
> > pairs of observed blocks, and each pair gives a collision with
> > probability 1^-64, so you need N=~2^32.
> [...]
> Doh! Of course... it was exactly the same "paradox" that showed how high the
> probability of a collision was in rsync for large files. You are not
> searching for a collision with a particular block, just between any two
> blocks.

I'm not entirely sure about rsync, but apparently such an attack will
make it slow down, but it will not corrupt files. Librsync on the other
hand will (currently) silently corrupt files. Before librsync users
panic; how much to you really care that a maliciously crafted file is
corrupted? In rsync this can be used as a performance degradation
attack. However, librsync by default uses a psedo-random checksum_seed,
that makes such an attack nearly impossible (however... the original
discussion started around using a fixed checksum_seed in "batch mode").

I've done a bit more thinking on this, and there are four ways crafted
blocks can be use;

1) two crafted blocks in the "original" file

2) two crafted blocks in the "target" file

3) a crafted pair of "target" and "original" files with matching

4) a block in the "target" crafted to match a block in the "original"

For case 1), it is possible to detect "collisions" at signature
generation time. Using a random checksum_seed makes it nearly
impossible. Combining the two, selecting a new random_seed when a
collision is detected, would make this pretty much impossible. Note that
a random_seed can be used for batch mode and librsync, but the seed
would need to be included in the signature.

For case 2), the crafted blocks would not match the original, and hence
would be "miss data" that is directly copied. Currently librsync and
rsync don't do any sort of "target self referencing matches", so this
cannot affect librsync or rsync. Note that there has been discussion on
adding this to librsync, and it is supported by the vdelta format.
When/if such a thing was added, similar conditions to 1) apply to the
"target matching" algorithm. Any such addition should probably use
xdelta style matching anyway, which compares the data, not a
"strong_sum", making it immune to crafted data.

For case 3), crafting a pair of files is made nearly impossible by using
the protections for case 1). It may be possible to craft a file that has
a reduced set of possible signatures for even a random_seed, but this
would require vulnerabilities in the strong_sum and/or random seed
generator. Barring such a vulnerability, a random_seed "randomises" the
signature for the "original", reducing this to case 4)

Case 4) is the nasty one. The protection against case 1) ensures that
the signature is randomised for a given "original", but once you have
the signature, you know the random_seed, and all elements of randomness
to the rest of the algorithm are gone. However, this is not a true
"birthday paradox" situation; you are not trying to find any two blocks
that match, but a block that matches any block in the "original". This
is no where near as easy. The number of attempts required is;

n ~= 2^(b'-m)


  n is the number of attempts required to find a collision
  m is from there are 2^m blocks in the signature
  b' is the number of useful bits in the signature (excluding rollsum)

Interestingly, the dynamic blocksum heuristic now in rsync is

  B = sqrt(s)
  b = 10 + 2*log2(s) - log2(B)


  B is the block size
  s is the file size
  32 bits of 'b' is the rollsum

which means, assuming 32 bits of rollsum is useless and hence should 
not be included in b';

  m = log2(s)/2
  b' = -22 + 2*log2(s) - log2(s)/2 = -22 + (3/2)*log2(s)

so the number of attempts is

  n = 2^(-22 + log2(s))

so the larger the file, the harder it is to compromise, because 'b'
grows faster than 'm'. Note that rsync has a lower limit on b' of 16,
which corresponds to a filesize of 32M. This equates to n = 2^3
attempts, which is rather low... in the realm of on the fly calculation

For librsync which uses a fixed b'=64, and default B=2048, this becomes;

  n = 2^(64 - log2(s/(2^11))) = 2^(75 - log2(s))

Which is 2^43 for a 4G file... not easy at all.

For rsync, a malicious "sender" could easily craft blocks that match the
signature but have different content. Big deal. A malicious "sender"
doesn't have to go through these hoops to interfere with rsync; it can
just "send" bad content. However, for a backup program using librsync
that also uses whole file checksums, this could be used to make the
backup "fail", by crafting a file that when updated produces a different
checksum... but due to librsync's large signature this is not easy to


case 2) has no impact

case 4) is of minimal impact in rsync, and sufficiently hard in

librsync needs a whole file checksum. Without it, it silently fails for
case 1), 3), and 4).

librsync could benefit from a random checksum_seed. It would need to be
included in the signature. Without it librsync is vulnerable to cases 1)
and 3).

librsync could benefit from the rsync dynamic blocksum algorithms.
Blocksum and block sizes that depend on the filesize could be used to
make librsync signatures more efficient, and more robust against case 4)
for very large files.

rsync shouldn't need a fixed seed for batch modes... just store the seed
in the signature. using a fixed seed makes it vulnerable to 1) and 3).

Donovan Baarda      

More information about the rsync mailing list