TODO hardlink performance optimizations

jw schultz jw at pegasys.ws
Wed Dec 17 23:51:13 EST 2003


On Wed, Dec 17, 2003 at 03:43:40AM -0800, cbarratt at users.sourceforge.net wrote:
> jw schultz writes:
> 
> > On Tue, Dec 16, 2003 at 03:18:15PM -0600, John Van Essen wrote:
> > > On Mon, 15 Dec 2003, jw schultz <jw at pegasys.ws> wrote:
> > > > Hard-link handling
> > > > 
> > > >   At the moment hardlink handling is very expensive, so it's off by
> > > >   default.  It does not need to be so.
> > > > 
> > > >   Since most of the solutions are rather intertwined with the file
> > > >   list it is probably better to fix that first, although fixing
> > > >   hardlinks is possibly simpler.
> > > > 
> > > >   We can rule out hardlinked directories since they will probably
> > > >   screw us up in all kinds of ways.  They simply should not be used.
> > > > 
> > > >   At the moment rsync only cares about hardlinks to regular files.  I
> > > >   guess you could also use them for sockets, devices and other beasts,
> > > >   but I have not seen them.
> > > > 
> > > >   When trying to reproduce hard links, we only need to worry about
> > > >   files that have more than one name (nlinks>1 && !S_ISDIR).
> > > 
> > > It would be very helpful if file_struct.flags could have a bit set to
> > > indicate that the node count was greater than 1.  This info could be
> > > used later to optimize the hardlink search by only considering those
> > > flist entries with this flag bit set.
> > > 
> > > It'd be nice to implement this bit setting in this protocol number so
> > > it can be widely distributed before 2.6.1 is released which could have
> > > the code to actually make use of it.  I'd be interested in doing the
> > > later changes, but if Martin or jw could at least get the bit set...
> > > It doesn't even have to be --hwlink option dependent.  Just examine
> > > the node count and set the bit.
> > 
> > I'm not keen on squeezing that in at this time.  Lets get it
> > out the door, hardlink performance improvements can be made
> > in a minor release.  I'm also a bit more inclined to pass
> > nlinks (IFF non-zero and ~IS_DIR).
> 
> The nlinks > 1 optimization would be a good one to add, but after
> the next release.
> 
> For hardlinks it would be great to only send the device and inode
> information in the file list IFF nlinks > 1 and ~IS_DIR.  Currently
> --hard-links sends device and inode for every file.  This causes a
> lot of unnecessary data to be sent, and also means the receiver has
> to store and search inode information for every file, rather than
> just candidate hardlinks.
> 
> Unfortunately all the bits in the flag byte in the file list are used,
> so we need to figure out some other way to indicate which files include
> (dev,inode) data.
> 
> The goal would be to make --hard-links have little impact on network
> traffic, memory and speed.
> 
> However, this would require a protocol bump. 

It might be time to increase the size of flags to 16 bits;
protocol dependant of course.  We'll need more bits anyway
if we ever add ACLs and EAs.  We'd want to use at least two
bits (rough approximation) in send_file_entry():

        if (protocol_version >= 28 && !S_ISDIR(file->mode)
	    && preserve_hard_links && file->st_nlink > 1)
	{
		flags |= HARD_LINKED;
		if (file->st_dev == last_dev)
			flags |= SAME_DEV;
	}

	...

        if (protocol_version < 28 && preserve_hard_links && S_ISREG(file->mode)) {
                if (protocol_version < 26) {
                        /* 32-bit dev_t and ino_t */
                        write_int(f, (int) file->dev);
                        write_int(f, (int) file->inode);
                } else {
                        /* 64-bit dev_t and ino_t */
                        write_longint(f, file->dev);
                        write_longint(f, file->inode);
                }
        }
	else if (flags & HARD_LINKED)
	{
		if (!(flags & SAME_DEV))
			write_longint(f, file->dev);
		write_longint(f, file->inode);
	}

and in recv_file_entry (even rougher):

	if (flags & HARD_LINKED)
	{
		if (!(flags & SAME_DEV))
			last_dev = read_longint(f);
		file->dev = last_dev;
		file->inode = read_longint(f);
	} else {
		file->dev = file->inode = 0;
	}


Using 0 for inode to indicate no hardlinks also fixes the
problem of erroneously trying to preserve links from
filesystems that not supporting inode numbers report 0 for
all inodes as was recently reported.


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt



More information about the rsync mailing list