TODO hardlink performance optimizations

jw schultz jw at pegasys.ws
Tue Jan 6 22:50:34 GMT 2004


On Mon, Jan 05, 2004 at 05:11:04PM -0800, Wayne Davison wrote:
> On Mon, Jan 05, 2004 at 04:18:30PM -0800, jw schultz wrote:
> > It puts us back into the business of doing a link lookup for every
> > regular file.
> 
> Ah yes, I missed that.  Adding a flag would be sufficient to avoid this.
> However, your more radical changes look even better.
> 
> I particularly like the idea of moving the inode/dev info out of the
> normal file_struct.  (Aside:  it might also be good to avoid doing a
> bunch of small allocs for the extra memory you suggest, perhaps by
> allocating a pool of structures like file_struct does, but we can worry
> more about that later.)

Use allocation pools and we do save some by freeing them.

> One other thought occurred to me that I haven't tried to analyze in
> depth yet:  what if we had a temporary hash (b-tree?) that we built from
> the dev+inode combinations as the names arrived.  The data would be an
> offset into the file-list array.  If we get a hit, we flag ourselves as
> an hlink-dup and link ourselves into the top of the chain pointed to by
> the first flist_struct (which would have a value that contains an index
> of the next hard-link item in our group).  At the end of this, the hash
> goes away.  Our transfer code would work much like what you proposed,
> with one additional optimization that I'd suggest:  when we finish off
> the first hard-link item in a group, link all its hard-link brethren at
> that point, not as a separate loop at the end of the transfer.
> 
> This "hash" idea could actually work with the current sort method too.
> We'd just sort a data structure that had the dev+inode+flist_index items
> in it and translate that into the values in the file_list (before
> discarding the extra data).

If we continued to do the link searches i'd support
changing the sorted list into a hash of lists.  Now i'm not
so sure.

> One final memory-saving hack:  turn the rdev item into a union with the
> new hard-link-list-index item, and only use it as an index if a flag
> were set indicating that we should (since we can't be both hard-linked
> and a device).

Except that we could hard-link devices and symlinks.  The
only file type that ought not be linked (manually) are
directories.  The fact that we don't preserve device and
symlink hardlinks is, in my opinion, a flaw.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt


More information about the rsync mailing list