Non-determinism
Berend Tober
btober at computer.org
Wed Apr 17 06:17:02 EST 2002
Is anyone else concerned about the fact that rsync doesn't guarantee
to produce identical file copies on the the target machine?
Don't get me wrong in sounding critical because I think that rsync is
a great example of how software should be written. (I often make the
observation, as I learn more about Linux, and inevitably find myself
comparing open source applications to Microsoft products, that the
people that wrote unix way back when at AT&T Bell Labs REALLY knew
what they were doing. I also have the same attitude toward the
developer and maintainer of rsync.)
But the "Technical Report" at
http://rsync.samba.org/tech_report/tech_report.html states that:
"If the two strong checksums match, we assume that we have found a
block of A which matches a block of B. In fact the blocks could be
different, but the probability of this is microscopic, and in
practice this is a reasonable assumption."
Is that good enough? The statement, I believe, refers to some
analytical estimate of the chance that the check-sums might match
despite having different source files for comparison, but has anyone
done empirical work to verify the we can pretty-much count on getting
reliable file copies on the target?
And how does this small probablity of file corruption compare to,
say, using a full file transfer or copy? In the latter case, you
might be tempted to think there is zero probablity of file
corruption, but if you think of any data transfer as sending a
digital signal through a noisy communication channel, there must be
some way to quantify the realiability of cp verses rsync. I'm not
sure that I have all the skills to do this analysis, but I'd be
interested in seeing it done.
Regards,
Berend Tober
More information about the rsync
mailing list