[patch] Add `--link-by-hash' option.

Jason M. Felice jfelice at cronosys.com
Mon Feb 9 23:30:18 GMT 2004

On Tue, Feb 10, 2004 at 10:11:09AM +1100, Donovan Baarda wrote:
> On Tue, 2004-02-10 at 07:48, Jason M. Felice wrote:
> > This patch adds the --link-by-hash=DIR option, which hard links received
> > files in a link farm arranged by MD4 file hash.  The result is that the system
> > will only store one copy of the unique contents of each file, regardless of
> > the file's name.
> Does this mean it also automatically detects renames?

No.  It can't detect whether two files have identical contents until
after the file has been transferred.  This patch can only save disk
space, not bandwidth.

> > Anyone have an example of an MD4 collision so I can test that case? :)
> How do you recover from that case?

Files in the link farm are arranged like so:


<DIR> is the parameter supplied to the --link-by-hash=DIR
<hash-first-8> is the first eight hex digits of the file's MD4 sum.
<hash-last-24> is the last 24 digits of the file's MD4 sum.
<n> is an integer, starting from 0.

Theoretically, if two files have the same MD4 hash, they will be
assigned consecutive numbers for <n>.

Oh, and to raise the bar, the sample MD4 collision files need to be the
same length :)  I did a little research and there were claims that MD4
had been "cracked"--I assume this means that there is some way other
than brute force to find a file which collides with a given example.
I can't seem to find any examples.

> > Patch Summary:
> > 
> >     -1   +1    Makefile.in
> >     -0   +304  hashlink.c (new)
> >     -1   +21   options.c
> >     -0   +6    proto.h
> >     -5   +21   receiver.c
> >     -0   +6    rsync.c
> >     -0   +7    rsync.h
> If this does everything I think it does, then it's a surprisingly small
> amount of changes for what it does.

It seems to be big enough to do what it does. :)

 Jason M. Felice
 Cronosys, LLC <http://www.cronosys.com/>
 216.221.4600 x302

More information about the rsync mailing list