[patch] Add `--link-by-hash' option (rev 2).

Craig Barratt cbarratt at users.sourceforge.net
Tue Feb 17 06:48:32 GMT 2004


"Jason M. Felice" writes:

> This patch adds the --link-by-hash=DIR option, which hard links received
> files in a link farm arranged by MD4 file hash.  The result is that the system
> will only store one copy of the unique contents of each file, regardless of
> the file's name.
> 
> (rev 2)
> * This revision is actually against CVS HEAD (I didn't realize I was working
>   from a stale rsync'd CVS).
> * Apply permissions after linking (permissions were lost if we already had
>   a copy of the file in the link farm).

I haven't studied your patch, but I have a couple of comments/questions:

  - If you update permissions, then all hardlinks will change too.
    Does that mean that all instances of an identical file will get
    the last mtime/permissions/ownership?  Or does the link farm have
    unique entries for contents plus meta data (vs just contents)?

  - Some file systems have a hardlink limit of 32000.  You will need to
    roll to a new file when that limit is exceeded (ie: link() fails).
    Also, empty files tend to be quite prevalent, so it is probably
    easier to just create those files and not link them (should be no
    difference in disk usage).

  - How does this patch interact with -H?

Craig


More information about the rsync mailing list