[patch] Add `--link-by-hash' option (rev 2).
cbarratt at users.sourceforge.net
Tue Feb 17 06:48:32 GMT 2004
"Jason M. Felice" writes:
> This patch adds the --link-by-hash=DIR option, which hard links received
> files in a link farm arranged by MD4 file hash. The result is that the system
> will only store one copy of the unique contents of each file, regardless of
> the file's name.
> (rev 2)
> * This revision is actually against CVS HEAD (I didn't realize I was working
> from a stale rsync'd CVS).
> * Apply permissions after linking (permissions were lost if we already had
> a copy of the file in the link farm).
I haven't studied your patch, but I have a couple of comments/questions:
- If you update permissions, then all hardlinks will change too.
Does that mean that all instances of an identical file will get
the last mtime/permissions/ownership? Or does the link farm have
unique entries for contents plus meta data (vs just contents)?
- Some file systems have a hardlink limit of 32000. You will need to
roll to a new file when that limit is exceeded (ie: link() fails).
Also, empty files tend to be quite prevalent, so it is probably
easier to just create those files and not link them (should be no
difference in disk usage).
- How does this patch interact with -H?
More information about the rsync