determinism (is too! :-P )

Martin Pool mbp at samba.org
Thu Apr 18 06:51:01 EST 2002


On 18 Apr 2002, btober <btober at seaworthysys.com> wrote:
> So, when thealgorithm thinks that a block of code in the source file A

Please read the thesis chapter again.  You're neglecting the fact
previously explained to you that rsync also does an MD4 across
the whole file as a consistency check.

> then how can the algothrim be considered
> deterministic?

Because it fulfils the definition of "deterministic" in FOLDOC and SICP,
amongst others.  That word doesn't mean what you seem to want it to 
mean.  

As was just pointed out, it is true that the rsync algorithm is not 
"provably correct".  The best you can prove is that the chance of a
particular file being transferred incorrectly is 2^-128.  If you can 
prove that the chance is actually less, then I'll be very interested
to hear it.  (If you can prove that MD4 is really weak, then it will
be a genuine breakthrough in CS.)

2^-128 is for all intents and purposes 0.  I'm not going to labour
this point anymore, other than perhaps to challenge you to flip
128 heads in a row on a fair coin.

> And don't ask me to produce an example, because I do understand
> "astronomical". But if you had a heart pacemaker whose operation
> depended on appropriate updates to a control data file, would you trust
> rsync to send that file update to the pacemaker?

I've never worked on a life support system.  I don't know exactly
how they make such determinations.  I would expect that they don't
use just a single integrity test but several independent ones.
For example, you could follow up the rsync transfer by checking 
the MD5 and SHA-1 digests of the files, or with spot-checks by 
a human.

Regardless: your original question was about the rsync algorithm;
the rsync program is much more likely to fail than the algorithm
because it depends on other things such as hardware, network 
cards, C libraries, and the implementation of rsync in C.  It's
pretty likely that all of them have bugs to some extent.  The 
algorithm being probabilistic is the least of your worries.

I know people who've worked on highly-secure or extremely-high-
availability systems.  Having triply redundant CPUs and memory,
as some people do, does not get you anywhere *near* a 2^-128
chance of failure.  (If nothing else, the chance of the Sun going
nova on any particular day is much greater than 2^-128.)

Consider, for example, the various Intel x86 microcode bugs,
of which several have been discovered in the time MD4 has 
been available for scrutiny.  All of them were in instructions
that were meant to be provably correct and (even using your 
bizarre definition) deterministic.  All of them make the 
machine fail in practice.

With those caveats

  Yes

If I had a pacemaker, and I heard that they used MD4 to verify 
correct download of the file, that would not worry me at all.
(If I heard, for example, that the pacemaker ran WinCE,
then I would immediately go and spend my life's savings. :-)

If what we know about the algorithms is correct, then 
if every planet in the visible universe was full of people with
such pacemakers, less than one of them would have an error.

I kind of think you're flogging a dead horse here.  Are you 
looking at using rsync and genuinely concerned about the algorithm?

-- 
Martin




More information about the rsync mailing list