TODO hardlink performance optimizations
jw at pegasys.ws
Tue Jan 6 22:50:34 GMT 2004
On Mon, Jan 05, 2004 at 05:11:04PM -0800, Wayne Davison wrote:
> On Mon, Jan 05, 2004 at 04:18:30PM -0800, jw schultz wrote:
> > It puts us back into the business of doing a link lookup for every
> > regular file.
> Ah yes, I missed that. Adding a flag would be sufficient to avoid this.
> However, your more radical changes look even better.
> I particularly like the idea of moving the inode/dev info out of the
> normal file_struct. (Aside: it might also be good to avoid doing a
> bunch of small allocs for the extra memory you suggest, perhaps by
> allocating a pool of structures like file_struct does, but we can worry
> more about that later.)
Use allocation pools and we do save some by freeing them.
> One other thought occurred to me that I haven't tried to analyze in
> depth yet: what if we had a temporary hash (b-tree?) that we built from
> the dev+inode combinations as the names arrived. The data would be an
> offset into the file-list array. If we get a hit, we flag ourselves as
> an hlink-dup and link ourselves into the top of the chain pointed to by
> the first flist_struct (which would have a value that contains an index
> of the next hard-link item in our group). At the end of this, the hash
> goes away. Our transfer code would work much like what you proposed,
> with one additional optimization that I'd suggest: when we finish off
> the first hard-link item in a group, link all its hard-link brethren at
> that point, not as a separate loop at the end of the transfer.
> This "hash" idea could actually work with the current sort method too.
> We'd just sort a data structure that had the dev+inode+flist_index items
> in it and translate that into the values in the file_list (before
> discarding the extra data).
If we continued to do the link searches i'd support
changing the sorted list into a hash of lists. Now i'm not
> One final memory-saving hack: turn the rdev item into a union with the
> new hard-link-list-index item, and only use it as an index if a flag
> were set indicating that we should (since we can't be both hard-linked
> and a device).
Except that we could hard-link devices and symlinks. The
only file type that ought not be linked (manually) are
directories. The fact that we don't preserve device and
symlink hardlinks is, in my opinion, a flaw.
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync