TODO hardlink performance optimizations

cbarratt at cbarratt at
Wed Dec 17 22:43:40 EST 2003

jw schultz writes:

> On Tue, Dec 16, 2003 at 03:18:15PM -0600, John Van Essen wrote:
> > On Mon, 15 Dec 2003, jw schultz <jw at> wrote:
> > > Hard-link handling
> > > 
> > >   At the moment hardlink handling is very expensive, so it's off by
> > >   default.  It does not need to be so.
> > > 
> > >   Since most of the solutions are rather intertwined with the file
> > >   list it is probably better to fix that first, although fixing
> > >   hardlinks is possibly simpler.
> > > 
> > >   We can rule out hardlinked directories since they will probably
> > >   screw us up in all kinds of ways.  They simply should not be used.
> > > 
> > >   At the moment rsync only cares about hardlinks to regular files.  I
> > >   guess you could also use them for sockets, devices and other beasts,
> > >   but I have not seen them.
> > > 
> > >   When trying to reproduce hard links, we only need to worry about
> > >   files that have more than one name (nlinks>1 && !S_ISDIR).
> > 
> > It would be very helpful if file_struct.flags could have a bit set to
> > indicate that the node count was greater than 1.  This info could be
> > used later to optimize the hardlink search by only considering those
> > flist entries with this flag bit set.
> > 
> > It'd be nice to implement this bit setting in this protocol number so
> > it can be widely distributed before 2.6.1 is released which could have
> > the code to actually make use of it.  I'd be interested in doing the
> > later changes, but if Martin or jw could at least get the bit set...
> > It doesn't even have to be --hwlink option dependent.  Just examine
> > the node count and set the bit.
> I'm not keen on squeezing that in at this time.  Lets get it
> out the door, hardlink performance improvements can be made
> in a minor release.  I'm also a bit more inclined to pass
> nlinks (IFF non-zero and ~IS_DIR).

The nlinks > 1 optimization would be a good one to add, but after
the next release.

For hardlinks it would be great to only send the device and inode
information in the file list IFF nlinks > 1 and ~IS_DIR.  Currently
--hard-links sends device and inode for every file.  This causes a
lot of unnecessary data to be sent, and also means the receiver has
to store and search inode information for every file, rather than
just candidate hardlinks.

Unfortunately all the bits in the flag byte in the file list are used,
so we need to figure out some other way to indicate which files include
(dev,inode) data.

The goal would be to make --hard-links have little impact on network
traffic, memory and speed.

However, this would require a protocol bump. 


More information about the rsync mailing list