TODO hardlink performance optimizations

Wayne Davison wayned at samba.org
Tue Jan 6 01:11:04 GMT 2004


On Mon, Jan 05, 2004 at 04:18:30PM -0800, jw schultz wrote:
> It puts us back into the business of doing a link lookup for every
> regular file.

Ah yes, I missed that.  Adding a flag would be sufficient to avoid this.
However, your more radical changes look even better.

I particularly like the idea of moving the inode/dev info out of the
normal file_struct.  (Aside:  it might also be good to avoid doing a
bunch of small allocs for the extra memory you suggest, perhaps by
allocating a pool of structures like file_struct does, but we can worry
more about that later.)

One other thought occurred to me that I haven't tried to analyze in
depth yet:  what if we had a temporary hash (b-tree?) that we built from
the dev+inode combinations as the names arrived.  The data would be an
offset into the file-list array.  If we get a hit, we flag ourselves as
an hlink-dup and link ourselves into the top of the chain pointed to by
the first flist_struct (which would have a value that contains an index
of the next hard-link item in our group).  At the end of this, the hash
goes away.  Our transfer code would work much like what you proposed,
with one additional optimization that I'd suggest:  when we finish off
the first hard-link item in a group, link all its hard-link brethren at
that point, not as a separate loop at the end of the transfer.

This "hash" idea could actually work with the current sort method too.
We'd just sort a data structure that had the dev+inode+flist_index items
in it and translate that into the values in the file_list (before
discarding the extra data).

One final memory-saving hack:  turn the rdev item into a union with the
new hard-link-list-index item, and only use it as an index if a flag
were set indicating that we should (since we can't be both hard-linked
and a device).

..wayne..


More information about the rsync mailing list