MD4 bug in rsync for lengths = 64 * n
Dave Dykstra
dwd at bell-labs.com
Thu Aug 29 11:19:00 EST 2002
On Sun, Aug 04, 2002 at 01:19:43PM -0700, Craig Barratt wrote:
> I am the author of BackupPC (http://backuppc.sourceforge.net) and
> I am working on adding rsync support to BackupPC.
>
> I am implementing the server-side in perl, and the client will
> run vanilla rsync. (BTW, is there the protocol documented? I've
> answered all my questions by looking at the source, but it would
> be great to check against any docs.)
On the contrary, the protocol used by the rsync program is highly optimized
and tied to the implementation in that program. Any time any little change
has to be made, the protocol version number has to be increased and the
code has to have more checks put in it in order to still interoperate with
older versions of the protocol. It is not intended to be extensible or to
communicate with other implementations. I don't recommend trying.
> I started with librsync 0.9.3 and the Intermezzo perl interface to
> librsync written by Shirish Phatak.
Using librsync is a much better way to go if you need to integrate the
rsync rolling-checksum algorithm into a program, but if you intend to
talk to the rsync program on the client side, I strongly recommend you
instead invoke the rsync server program on the server side.
> However, as I'm sure is well-known,
> the Adler crc32 and MD4 computed by librsync don't match those in
> rsync 2.5.5.
I do not recall hearing anybody mention that before.
> After swapping the crc32 endianess, changing RS_CHAR_OFFSET from 31 to
> 0, and adding rsync's checksum_seed to librsync's MD4 they now agree,
> except in one case: when the total data length (including the 4 byte
> checksum_seed) is a multiple of 64, the MD4 checksums in librsync and
> rsync don't agree. After reviewing the code in librsync, rsync and the
> original RSA implementation, I believe the bug is in rsync. It doesn't
> call mdfour_tail() when the last fragment is empty. Unfortunately
> this happens in the particularly common case of 700 + 4 = 64 * 11.
> The same bug occurs in both the block MD4s and the entire-file MD4.
>
> The bug is benign in the sense that it is on both sides so rsync works
> correctly. But it is possible (I am certainly not a crypto expert) that
> missing the trailing block (that includes the bit length) substantially
> weakens the computed MD4.
>
> The fix is easy: a couple of ">" checks should be ">=". I can send
> diffs if you want. But of course this can't be rolled in unless it
> is coupled with a bump in the protocol version.
Another bump in the protocol version is no problem. Please submit a patch.
> I saw some earlier
> email about fixing MD4 to handle files >= 512MB (I presume this
> relates to the 64-bit bit count in the final block). Perhaps this
> change can be made at the same time?
Could you please post a reference to that email? It isn't familiar to me
and I didn't find it through google. There have been other problems we've
been seeing with with the end of large files and zlib compression, though.
I wonder if it can somehow be related.
- Dave Dykstra
More information about the rsync
mailing list