MD4 checksum fix

Donovan Baarda abo at minkirri.apana.org.au
Tue Apr 1 12:41:39 EST 2003


On Tue, 2003-04-01 at 09:35, jw schultz wrote:
> On Tue, Jan 28, 2003 at 11:22:14PM -0800, Craig Barratt wrote:
> > And I have several things I would like to work on and submit:
> > 
> >  - Fix the MD4 block and file checksums to comply with the rfc
> >    (currently MD4 is wrong for blocks of size 64*n, or files
> >    longer than 512MB).
[...]
> > But before I work on these I would like to make sure there is interest
> > in including them.
> 
> I've not heard from you on the adaptive checksum length patch.
> I shall be committing it shortly subject to objections or
> further discussion.

It looked OK to me from an "implementing what we discussed" point of
view... I noticed you used the "test square for each bit" algorithm
instead of the Newton-Raphson implementation for the square-root. It's
certainly more complex and I don't know if it is actually faster, but it
does avoid divide ops.

I don't know enough about the rest of the code to comment on whether it
will break the protocol though.

> I would like to see the MD4 checksums fixed as well.  We are
> very close to the upper limit on protocol versions for
> deployed versions of rsync.  Therefore, i would like to
> minimize protocol increments for a while.  In any other
> circumstance i wouldn't suggest doing so but i think it
> would be a good idea to integrate these two fixes in one
> protocol bump.
> 
> If you, or someone else, has the fix for the MD4 sums handy
> i would be glad to coordinate in implementing both sets of
> patches at about the same time.  If you want to send it to
> me, that would be fine.  I can also hold off for a short
> while for the MD4 sum patch to get some testing.

librsync includes a fixed and optimized md4sum implementation in CVS on
sourceforge. It should be compatible with the rsync implementation, as
that's where it originally came from and the API has not (yet) changed,
despite the implementation being fairly overhauled.

However, I also have another even more refined (20% less code, and
faster) md4sum implementation based on this, but the API has changed to
conform with the RSA implementation for inclusion in libmd. I hope to
change librsync to use this API after releasing 0.9.6. This
implementation is available here;

http://minkirri.apana.org.au/~abo/projects/libmd/

If you were going to start changing the md4sum code, I would encourage
migrating to the RSA API as it _seems_ to be more widely used (certainly
in BSD land) and would allow linking against the libmd library where it
is available.

Unfortunately, backwards compatibility probably means you will need to
include a broken implementation as well as the fixed implementation. The
best way I can think to do this is to provide an additional broken
"MD4Final" (or "rs_mdfour_result" using the old librsync API) that
implements the old behaviour. All the brokenness was in the finalizing
of the md4sum so this single alternative "MDFinalBroken" routine should
be all you need.

If people ask really nice and are not in a hurry, I could probably whip
such a function up to supplement the libmd API implementation. Someone
else would have to do the smarts of protocol version juggling and
supporting the changed API.

I'm not particularly interested in supporting the old rsync/librsync
mdfour API.

-- 
----------------------------------------------------------------
Donovan Baarda                http://minkirri.apana.org.au/~abo/
----------------------------------------------------------------



More information about the rsync mailing list