Non-determinism

Martin Pool mbp at samba.org
Wed Apr 17 07:13:03 EST 2002


You gave me a scare with that subject line :-/

I'm glad you like it.  Unix as a literature or culture is amazing.

The analysis is done to a reasonable extent in tridge's thesis.
MD4 (and MD5) is no longer considered cryptographically strong, 
but we're not contending against an intelligent adversary here,
only random chance.

You might like to look at Schneier's /Applied Cryptography 2nd ed/
for details on MD4.  It produces a 128-bit hash; I am fairly
sure that the way it's used in rsync that means there is a
2^-128 chance of an undetected failure.

Sure, it's only probabilistic.  Most aspects of computer systems
are:

 - your memory chip or processor might be hit by a sufficiently
   powerful photon to cause corruption

 - the ECC in your memory might not detect the error (this is 
   based on checksums too, and weaker ones than MD4)

 - your TCP stack might not detect a data channel error (another
   checksum)

 - all the disks in your RAID set might die simultaneously

 - a comet might strike Earth extinguishing all life

Schneier has a neat table of various probabilities in chapter 1.
A failure of MD4 by random data corruption (2^-128) is astromically
less likely than "winning the top prize in a US state lottery 
and being killed by lightning on the same day" (2^-55).  Etcetera.

Leaving aside random failures, disks will certainly grind themselves
into dust before getting anywhere near 2^128 operations.  (The universe
is about 2^61 seconds old.)

It's possible that something about the way rsync uses MD4 makes
the protection much less strong.  I suspect one would be more likely
to find such a problem by analysis not testing.

So I don't want to discourage you from checking that the probability 
is actually as low as it is claimed to be, or from finding an embarassing 
error in my maths :-), but I don't think you need worry about it 
merely because it is probabilistic.

rsync's problems mostly lie in software engineering (bugs, portability,
back-compatibility, documentation, ..), not computer science (probability,
algorithms, etc...)  Sometimes I think the other way would be more fun.

--
Martin




More information about the rsync mailing list