TODO hardlink performance optimizations

Wed Dec 17 08:49:46 EST 2003

On Tue, Dec 16, 2003 at 03:18:15PM -0600, John Van Essen wrote:
> On Mon, 15 Dec 2003, jw schultz <jw at pegasys.ws> wrote:
> 
> > OK, first pass on TODO complete.
> ....
> > PERFORMANCE ----------------------------------------------------------
> ....
> > Traverse just one directory at a time
> > 
> >   Traverse just one directory at a time.  Tridge says it's possible.
> > 
> >   At the moment rsync reads the whole file list into memory at the
> >   start, which makes us use a lot of memory and also not pipeline
> >   network access as much as we could.
> 
> An additional comment should be added observing that this will affect
> hardlink processing, since it relies on the entire flist array being
> present in order to match dev and inode numbers.  But perhaps the
> required hlist array could be saved and built on the fly as items
> with node counts > 1 are encountered.

Dynamic creation of the hlist set (hash perhaps) would deal with that, yes.

> 
> ....
> > Hard-link handling
> > 
> >   At the moment hardlink handling is very expensive, so it's off by
> >   default.  It does not need to be so.
> > 
> >   Since most of the solutions are rather intertwined with the file
> >   list it is probably better to fix that first, although fixing
> >   hardlinks is possibly simpler.
> > 
> >   We can rule out hardlinked directories since they will probably
> >   screw us up in all kinds of ways.  They simply should not be used.
> > 
> >   At the moment rsync only cares about hardlinks to regular files.  I
> >   guess you could also use them for sockets, devices and other beasts,
> >   but I have not seen them.
> > 
> >   When trying to reproduce hard links, we only need to worry about
> >   files that have more than one name (nlinks>1 && !S_ISDIR).
> 
> It would be very helpful if file_struct.flags could have a bit set to
> indicate that the node count was greater than 1.  This info could be
> used later to optimize the hardlink search by only considering those
> flist entries with this flag bit set.
> 
> It'd be nice to implement this bit setting in this protocol number so
> it can be widely distributed before 2.6.1 is released which could have
> the code to actually make use of it.  I'd be interested in doing the
> later changes, but if Martin or jw could at least get the bit set...
> It doesn't even have to be --hwlink option dependent.  Just examine
> the node count and set the bit.

I'm not keen on squeezing that in at this time.  Lets get it
out the door, hardlink performance improvements can be made
in a minor release.  I'm also a bit more inclined to pass
nlinks (IFF non-zero and ~IS_DIR).

> 
> >   The basic point of this is to discover alternate names that refer to
> >   the same file.  All operations, including creating the file and
> >   writing modifications to it need only to be done for the first name.
> >   For all later names, we just create the link and then leave it
> >   alone.
> 
> An earlier thread started 11/25/2003 points out that in certain cases,
> a hardlinked file is unnecessarily transferred in full.  This is due
> to the algortihm described above.  If the first file in the sorted list
> is missing, but a later one exists, then that file should be used as
> the master.  I've been thinking of solutions to this as well.  But
> not until after 2.6.0 is released.
> 
> >   If hard links are to be preserved:
> > 
> >     Before the generator/receiver fork, the list of files is received
> >     from the sender (recv_file_list), and a table for detecting hard
> >     links is built.
> > 
> >     The generator looks for hard links within the file list and does
> >     not send checksums for them, though it does send other metadata.
> > 
> >     The sender sends the device number and inode with file entries, so
> >     that files are uniquely identified.
> > 
> >     The receiver goes through and creates hard links (do_hard_links)
> >     after all data has been written, but before directory permissions
> >     are set.
> > 
> >   At the moment device and inum are sent as 4-byte integers, which
> >   will probably cause problems on large filesystems.  On Linux the
> >   kernel uses 64-bit ino_t's internally, and people will soon have
> >   filesystems big enough to use them.  We ought to follow NFS4 in
> >   using 64-bit device and inode identification, perhaps with a
> >   protocol version bump.
> > 
> >   Once we've seen all the names for a particular file, we no longer
> >   need to think about it and we can deallocate the memory.
> > 
> >   We can also have the case where there are links to a file that are
> >   not in the tree being transferred.  There's nothing we can do about
> >   that.  Because we rename the destination into place after writing,
> >   any hardlinks to the old file are always going to be orphaned.  In
> >   fact that is almost necessary because otherwise we'd get really
> >   confused if we were generating checksums for one name of a file and
> >   modifying another.
> > 
> >   At the moment the code seems to make a whole second copy of the file
> >   list, which seems unnecessary.
> 
> Indeed!  It does!  Very wasteful.  It should only need a list of pointers
> to the flist entries and sort that list.  Furthermore, with the addition
> of the new multiple nodes flag bit requested above, the list of pointers
> would only contain pointers to flist entries with that bit set, resulting
> in a much smaller list.

You do mean multiple paths, same node? :)

Lets take this up after the release, shall we?

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt