[patch] Add `--link-by-hash' option (rev 2).
Jason M. Felice
jfelice at cronosys.com
Tue Feb 17 14:34:45 GMT 2004
On Mon, Feb 16, 2004 at 10:48:32PM -0800, Craig Barratt wrote:
> "Jason M. Felice" writes:
>
> > This patch adds the --link-by-hash=DIR option, which hard links received
> > files in a link farm arranged by MD4 file hash. The result is that the system
> > will only store one copy of the unique contents of each file, regardless of
> > the file's name.
> >
> > (rev 2)
> > * This revision is actually against CVS HEAD (I didn't realize I was working
> > from a stale rsync'd CVS).
> > * Apply permissions after linking (permissions were lost if we already had
> > a copy of the file in the link farm).
>
> I haven't studied your patch, but I have a couple of comments/questions:
>
> - If you update permissions, then all hardlinks will change too.
> Does that mean that all instances of an identical file will get
> the last mtime/permissions/ownership? Or does the link farm have
> unique entries for contents plus meta data (vs just contents)?
All instances of the file will have the last mtime/permissions/ownership.
This is not such a big deal for me (although it is annoying), but I
can't afford to keep multiple copies of files just because the metadata
is different. If anyone has any suggestions to solve this which aren't
too incredibly hackish, I'll implement (all I can think of is to store
permissions in dotfiles or implement my original idea of a "database
backend" as opposed to a "filesystem backend").
> - Some file systems have a hardlink limit of 32000. You will need to
> roll to a new file when that limit is exceeded (ie: link() fails).
Ick. Well, I *do* need it.
> Also, empty files tend to be quite prevalent, so it is probably
> easier to just create those files and not link them (should be no
> difference in disk usage).
Sounds good. In my test rsyncs (/etc from several machines), the zero-byte
file got 117 links.
> - How does this patch interact with -H?
They should be compatible.
>
> Craig
I'll update the patch and post.
--
Jason M. Felice
Cronosys, LLC <http://www.cronosys.com/>
216.221.4600 x302
More information about the rsync
mailing list