TODO hardlink performance optimizations

jw schultz jw at pegasys.ws
Mon Jan 5 00:23:17 GMT 2004


On Sun, Jan 04, 2004 at 05:30:03AM -0800, jw schultz wrote:
> On Sun, Jan 04, 2004 at 06:35:03AM -0600, John Van Essen wrote:
> > I've modified hlink.c to use a list of file struct pointers instead of
> > copies of the actual file structs themselves, so that will save memory.
> > I'll submit that patch for review in a day or two after I've tested it.
> 
> I've just done the same.  It reduces the memory requirements
> of the hlink list to 1/18th.  It is also somewhat faster to
> build that way because we don't have to walk the list.
> 
> If we built the hlink_list one element at a time the way we
> do the file_list only putting those files that we might link
> in it it would be smaller but building it would be slower.

I'm noodling on the idea of purging the hlink_list of files
that don't have links.

Start by getting rid of the !IS_REG files because we don't
link those anyway. 

After the hlink_list is sorted hardlinked files are adjacent
to one another in the list so a single pass walk could do an
implace purge (collapse) of non-linked files.

At one pointer per file space savings wouldn't be that large.
But for every regular file we to a binary search of the
hlink_list.  100,000 binary searches of a 200 member list
would be a lot faster than 100,000 searches of a 105,000
member list.

If a flag were added to the file_struct the purge pass could
set it if the file had any links so the 100,000 binary
searches of a 105,000 member list could be reduced to 200
binary searches of a 200 member list.  We do init_hard_links
prior to forking so the flag wouldn't suffer COW.

My current version of the patch does get rid of the
non-regular files but i don't do the non-linked purge.
It is just an idea at this point.  I'm not quite motivated
enough to do it yet.

Hmm, 2.6.1 is shaping up to be a performance release.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt


More information about the rsync mailing list