TODO hardlink performance optimizations
jw schultz
jw at pegasys.ws
Mon Jan 5 00:23:17 GMT 2004
On Sun, Jan 04, 2004 at 05:30:03AM -0800, jw schultz wrote:
> On Sun, Jan 04, 2004 at 06:35:03AM -0600, John Van Essen wrote:
> > I've modified hlink.c to use a list of file struct pointers instead of
> > copies of the actual file structs themselves, so that will save memory.
> > I'll submit that patch for review in a day or two after I've tested it.
>
> I've just done the same. It reduces the memory requirements
> of the hlink list to 1/18th. It is also somewhat faster to
> build that way because we don't have to walk the list.
>
> If we built the hlink_list one element at a time the way we
> do the file_list only putting those files that we might link
> in it it would be smaller but building it would be slower.
I'm noodling on the idea of purging the hlink_list of files
that don't have links.
Start by getting rid of the !IS_REG files because we don't
link those anyway.
After the hlink_list is sorted hardlinked files are adjacent
to one another in the list so a single pass walk could do an
implace purge (collapse) of non-linked files.
At one pointer per file space savings wouldn't be that large.
But for every regular file we to a binary search of the
hlink_list. 100,000 binary searches of a 200 member list
would be a lot faster than 100,000 searches of a 105,000
member list.
If a flag were added to the file_struct the purge pass could
set it if the file had any links so the 100,000 binary
searches of a 105,000 member list could be reduced to 200
binary searches of a 200 member list. We do init_hard_links
prior to forking so the flag wouldn't suffer COW.
My current version of the patch does get rid of the
non-regular files but i don't do the non-linked purge.
It is just an idea at this point. I'm not quite motivated
enough to do it yet.
Hmm, 2.6.1 is shaping up to be a performance release.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync
mailing list