[patch] Add `--link-by-hash' option (rev 2).

Jason M. Felice jfelice at cronosys.com
Tue Feb 17 14:34:45 GMT 2004


On Mon, Feb 16, 2004 at 10:48:32PM -0800, Craig Barratt wrote:
> "Jason M. Felice" writes:
> 
> > This patch adds the --link-by-hash=DIR option, which hard links received
> > files in a link farm arranged by MD4 file hash.  The result is that the system
> > will only store one copy of the unique contents of each file, regardless of
> > the file's name.
> > 
> > (rev 2)
> > * This revision is actually against CVS HEAD (I didn't realize I was working
> >   from a stale rsync'd CVS).
> > * Apply permissions after linking (permissions were lost if we already had
> >   a copy of the file in the link farm).
> 
> I haven't studied your patch, but I have a couple of comments/questions:
> 
>   - If you update permissions, then all hardlinks will change too.
>     Does that mean that all instances of an identical file will get
>     the last mtime/permissions/ownership?  Or does the link farm have
>     unique entries for contents plus meta data (vs just contents)?

All instances of the file will have the last mtime/permissions/ownership.
This is not such a big deal for me (although it is annoying), but I
can't afford to keep multiple copies of files just because the metadata
is different.  If anyone has any suggestions to solve this which aren't
too incredibly hackish, I'll implement (all I can think of is to store
permissions in dotfiles or implement my original idea of a "database
backend" as opposed to a "filesystem backend").

>   - Some file systems have a hardlink limit of 32000.  You will need to
>     roll to a new file when that limit is exceeded (ie: link() fails).

Ick.  Well, I *do* need it.

>     Also, empty files tend to be quite prevalent, so it is probably
>     easier to just create those files and not link them (should be no
>     difference in disk usage).

Sounds good.  In my test rsyncs (/etc from several machines), the zero-byte
file got 117 links.

>   - How does this patch interact with -H?

They should be compatible.

> 
> Craig

I'll update the patch and post.

-- 
 Jason M. Felice
 Cronosys, LLC <http://www.cronosys.com/>
 216.221.4600 x302


More information about the rsync mailing list