TODO hardlink performance optimizations

Wayne Davison wayned at samba.org
Mon Jan 5 22:46:52 GMT 2004


On Mon, Jan 05, 2004 at 01:12:16PM -0800, jw schultz wrote:
> - init_hard_links() sets nlinks in file_struct.

I assume you mean by counting adjacent runs of identical inodes after
the sort is done (while eliminating single-item runs from the list at
the same time).  I'd like to avoid adding the nlinks variable to the
per-file structure that would add more memory when --hard-link wasn't
used, though.  One way to do this and to cut down on all the adjacent-
entry comparisons would be to create a parallel array (after the sort)
of counts back to the first entry.  This way, whenever we have a match,
we know exactly how many items back the first item is in the array.  It
would also let you scan to the end of each grouping by stopping when you
get down to an item with a 0 offset (which would require an extra item
at the end of the list for a terminating 0).

> - change the return value of check_hard_link() from int to
>   file_struct** so it returns the pointer to the first entry
>   in the hlink_list array that matches, or NULL if false.

Just to be sure we're on the same page:  we need three states:  we need
to know "master linked" (with pointer to first item) "slave linked" (so
we can skip it until the end-processing) and "not linked".  If the first
two states return a pointer to the first item, we can differentiate them
by checking if the returned value is "us" or not.

> - basis_path() now can check nlinks and if !0 iterate over the paths
>   provided by the list returned from check_hard_link() looking for the
>   first that exists in dest or compare_dest.

This seems like a good way to go to me.

..wayne..


More information about the rsync mailing list