rsync md4sum code.

Donovan Baarda abo at minkirri.apana.org.au
Sun Apr 28 06:50:43 EST 2002


On Sun, Apr 28, 2002 at 01:06:10PM +1000, Donovan Baarda wrote:
> On Sat, Apr 27, 2002 at 03:32:47PM -0700, Martin Pool wrote:
> > On 27 Apr 2002, Donovan Baarda <abo at minkirri.apana.org.au> wrote:
> > > G'day,
> > > 
> > > I've been working on a Python interface to librsync and have noticed that it
> > > uses md4sum code borrowed from Andrew Tridgell and Martin Pool that comes
> > > via rsync and was originally written for samba.
> > 
> > Tridge recently discovered a bug in that code that probably does not
> > weaken the digest, but that may make it incompatible with standard
> > MD4.  Basically, tail-extension is not properly carried out for blocks
> > that are a multiple of 64 bytes in size.
> 
> This would be nealy all blocks, as everyone would be using 2^n sized blocks
> where n>5. If you meant to say "...that are _not_ multiple of 64 bytes...",
> then I would dare to suggest fixing this would not hurt anybody, but
> definitely record the affects. 
> 
> > I haven't had a chance yet to check how this affects rsync.  If it
> > does, I suppose we should evolve the protocol to fix it.
> > 
> > There's not meant to be anything special about it.  One of my TODO
> > items was to replace it with a faster implementation.
> 
> I'm not sure how the RSA implementation compares speed-wise, but given it is
> more "correct", would there be major objections to replacing the samba md4
> with the RSA one in librsync? I guess I should benchmark and publish
> results...

Ok, preliminary results are in. Because I'm primarily interested in Python
interface stuff at the moment, I chose to make my benchmark by hacking
together swig wrappers for the libmd md4c.c and the librsync mdfour.c, and
whipping up a python test rig. The wrappers should have identical overheads.
I used pre-generated random data for the input.

The results for a Celeron-366 doing 10K md4sums on 4K blocks of random data
are; libmd 3.1secs, librsync 2.5secs. Both gave identical checksum results
so the bug Tridge found didn't rear it's head. 

Conclusion: the librsync md4 is pretty fast. However, it was also slightly
harder to swig-wrap in isolation from the rest of librsync, as it had a few
tie-ins with the rs_trace stuff.

-- 
----------------------------------------------------------------------
ABO: finger abo at minkirri.apana.org.au for more info, including pgp key
----------------------------------------------------------------------




More information about the rsync mailing list