[patch] Add `--link-by-hash' option.
Hans Eric Sandström
hes at xinit.se
Tue Feb 10 13:18:49 GMT 2004
On Tuesday, February 10, 2004 12:30 AM, Jason M. Felice wrote:
> On Tue, Feb 10, 2004 at 10:11:09AM +1100, Donovan Baarda wrote:
> > On Tue, 2004-02-10 at 07:48, Jason M. Felice wrote:
> > > This patch adds the --link-by-hash=DIR option, which hard links received
> > > files in a link farm arranged by MD4 file hash. The result is that the
system
> > > will only store one copy of the unique contents of each file, regardless
of
> > > the file's name.
> >
> > Does this mean it also automatically detects renames?
>
> No. It can't detect whether two files have identical contents until
> after the file has been transferred. This patch can only save disk
> space, not bandwidth.
>
This is an idea of how to detect file renames:
1. Create the md4 farm for all files in the destination directory hierarchy
before transfers starts.
2. For all recieved files the md4 farm is updated.
3. When a renamed or hard linked file is encountered, rsync will detect this.
4. After rsync is done the farm is deleted.
Well, This is of cource a brute force idea. The initial md4 farm should
probably
first be created in memory and the files only when needed.
/Hans Eric Sandström
MailCORE AB
More information about the rsync
mailing list