MD4 checksum fix

jw schultz jw at pegasys.ws
Tue Apr 1 13:34:35 EST 2003


On Tue, Apr 01, 2003 at 12:41:39PM +1000, Donovan Baarda wrote:
> On Tue, 2003-04-01 at 09:35, jw schultz wrote:
> > On Tue, Jan 28, 2003 at 11:22:14PM -0800, Craig Barratt wrote:
> > > And I have several things I would like to work on and submit:
> > > 
> > >  - Fix the MD4 block and file checksums to comply with the rfc
> > >    (currently MD4 is wrong for blocks of size 64*n, or files
> > >    longer than 512MB).
> [...]
> > > But before I work on these I would like to make sure there is interest
> > > in including them.
> > 
> > I've not heard from you on the adaptive checksum length patch.
> > I shall be committing it shortly subject to objections or
> > further discussion.
> 
> It looked OK to me from an "implementing what we discussed" point of
> view... I noticed you used the "test square for each bit" algorithm
> instead of the Newton-Raphson implementation for the square-root. It's
> certainly more complex and I don't know if it is actually faster, but it
> does avoid divide ops.

Comparative complexity is a matter of perspective.  I think
it is a bit more deterministic.  It has been a few years
since i reviewed clock-cycle times for instructions but i
recall divide using almost an order of magnitude (decimal)
more clock cycles than a multiply.  Which one of these
routines is actually faster i can't say.  It would probably
take a test harness to know.  Floating point sqrt would most
likely be fastest.  My guess is that it really won't make
that much difference in run-time.  Especially since this is
done on the receiver and it is the sender that is CPU bound.

> 
> I don't know enough about the rest of the code to comment on whether it
> will break the protocol though.

I'm currently running it here and it seems to be surviving
and match is working with the odd checksum sizes and all.
I also tested connecting to 2.5.6 and it downgraded fine.

> 
> > I would like to see the MD4 checksums fixed as well.  We are
> > very close to the upper limit on protocol versions for
> > deployed versions of rsync.  Therefore, i would like to
> > minimize protocol increments for a while.  In any other
> > circumstance i wouldn't suggest doing so but i think it
> > would be a good idea to integrate these two fixes in one
> > protocol bump.
> > 
> > If you, or someone else, has the fix for the MD4 sums handy
> > i would be glad to coordinate in implementing both sets of
> > patches at about the same time.  If you want to send it to
> > me, that would be fine.  I can also hold off for a short
> > while for the MD4 sum patch to get some testing.
> 
> librsync includes a fixed and optimized md4sum implementation in CVS on
> sourceforge. It should be compatible with the rsync implementation, as
> that's where it originally came from and the API has not (yet) changed,
> despite the implementation being fairly overhauled.
> 
> However, I also have another even more refined (20% less code, and
> faster) md4sum implementation based on this, but the API has changed to
> conform with the RSA implementation for inclusion in libmd. I hope to
> change librsync to use this API after releasing 0.9.6. This
> implementation is available here;
> 
> http://minkirri.apana.org.au/~abo/projects/libmd/
> 
> If you were going to start changing the md4sum code, I would encourage
> migrating to the RSA API as it _seems_ to be more widely used (certainly
> in BSD land) and would allow linking against the libmd library where it
> is available.
> 
> Unfortunately, backwards compatibility probably means you will need to
> include a broken implementation as well as the fixed implementation. The
> best way I can think to do this is to provide an additional broken
> "MD4Final" (or "rs_mdfour_result" using the old librsync API) that
> implements the old behaviour. All the brokenness was in the finalizing
> of the md4sum so this single alternative "MDFinalBroken" routine should
> be all you need.
> 
> If people ask really nice and are not in a hurry, I could probably whip
> such a function up to supplement the libmd API implementation. Someone
> else would have to do the smarts of protocol version juggling and
> supporting the changed API.
> 
> I'm not particularly interested in supporting the old rsync/librsync
> mdfour API.

I can understand that.  I do think the API change and the md4
bug fix should be done separately.  The API change is really
internal, it is the bug fix that affects the protocol.

All i'm offering here is to coordinate with someone fixing
the bug or to take the fix (if it can be agreed upon) from
someone who can encapsulate it into a patch.  And that just
so we can allow backwards compatability a little longer.
Once we go to protocol version 31 2.5.6 will be incompatible
and i'm already bumping it to 27.

I don't pretend to understand the math, nor do i care to
try, so i don't want to touch that code for myself.  What i
do want is to have these things done right and according to
spec such that independent implementations can theoretically
be compatible.  If we can make the code smaller and faster
as well, great.


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt


More information about the rsync mailing list