TODO hardlink performance optimizations

Wed Jan 7 09:30:19 GMT 2004

On Wed, Jan 07, 2004 at 02:45:46AM -0600, John Van Essen wrote:
> On Wed, 7 Jan 2004 00:03:13 -0800, jw schultz <jw at pegasys.ws> wrote:
> > On Wed, Jan 07, 2004 at 01:04:34AM -0600, John Van Essen wrote:
> > 
> > The hardlinks have to be created in the receiver but the
> > logic of selecting which one is the target/head must happen
> > before the fork so the receiver can act on it.  This means
> > that the data related to these must become fixed before the
> > fork.  Doing that also avoids a COW.
> 
> The COW comes into play only in the hlink_list, but let's drop
> this enhancement (hardlinking during file processing) for now.
> But...  I would like to see it done if possible so the sorted
> verbose listing of transferred files has the hardlinked files
> properly sorted in the output instead of grouped at the end...
> (That bugs me about the symlinks, too, but that's neither here
> nor there...)
>  
> > The list** reuses the qsorted hlink_list.  The linked list
> > with head means we only need a list of heads.  Pick one or
> > the other and don't modify the data after the fork.
> > 
> > I'm inclined right now to go with the non-looping linked
> > list and a list of heads so that in the generator we
> > 
> >         if (file->links && file->links->head != file)
> >                 return;
> >         /* test will be a little more complicated because
> >          * file is actually a copy
> >          */
> > 
> > Then do_hard_links() would
> > 
> >         for (i = 0; i < hlink_count; i++) {
> >                 src = hlink_list[i];
> >                 while (dest = hlink_list[i]->links->next) {
> >                         do_a_link(src, dest);
> >                 }
> >         }
> 
> The point of this exercise was to find a way to avoid unnecessary
> transfers of already existing files, and what we've designed so
> far does that (and also gets rid of the binary search), so yes -
> let's go with this and revisit possible further enhancements later.

I thought the point was to reduce the memory footprint and
then get rid of the binary search.

The unnecessary transfer elimination is a happy potential
byproduct of pushing the hlink data to where the generator
can iterate over hlink sets.

The steps i see are:

- The hlink_list change to a pointer array (just committed)

- Create the union and change file_struct and the routines
  that reference and populate it to use the union for dev
  and inode.  This may include not allocating the union for
  unlinkable files.

- Overwrite the unions with the linked list stuff and change
  the logic to use them. Also free the unions for unlinked
  files.
  (this is the biggest step)

- Reduce the hlink_list to just the heads and change
  do_hard_links.

- consolidate the fnamecmp finder function for
  recv_generator() in generator.c and recv_files() in
  receiver.c

- Add the list walk for heads that don't exist yet.

Each of these is a discrete step that when complete the code
will function correctly.

> Feel free to start coding.  ;-)  Not that I'm lazy...  <cough>

Oh! Sorry to hear that, I am.  The only thing preventing me
from saying go ahead is my uncertainty whether we both have
the same design.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt