[patch] Add `--link-by-hash' option.

Hans Eric Sandström hes at xinit.se
Tue Feb 10 13:18:49 GMT 2004


On Tuesday, February 10, 2004 12:30 AM, Jason M. Felice wrote:

> On Tue, Feb 10, 2004 at 10:11:09AM +1100, Donovan Baarda wrote:
> > On Tue, 2004-02-10 at 07:48, Jason M. Felice wrote:
> > > This patch adds the --link-by-hash=DIR option, which hard links received
> > > files in a link farm arranged by MD4 file hash.  The result is that the
system
> > > will only store one copy of the unique contents of each file, regardless
of
> > > the file's name.
> >
> > Does this mean it also automatically detects renames?
>
> No.  It can't detect whether two files have identical contents until
> after the file has been transferred.  This patch can only save disk
> space, not bandwidth.
>
This is an idea of how to detect file renames:
1. Create the md4 farm for all files in the destination directory hierarchy
 before transfers starts.
2. For all recieved files the md4 farm is updated.
3. When a renamed or hard linked file is encountered, rsync will detect this.
4. After rsync is done the farm is deleted.

Well, This is of cource a brute force idea. The initial md4 farm should
probably
first be created in memory and the files only when needed.

/Hans Eric Sandström
MailCORE AB



More information about the rsync mailing list