TODO hardlink performance optimizations
jw schultz
jw at pegasys.ws
Wed Jan 7 09:30:19 GMT 2004
On Wed, Jan 07, 2004 at 02:45:46AM -0600, John Van Essen wrote:
> On Wed, 7 Jan 2004 00:03:13 -0800, jw schultz <jw at pegasys.ws> wrote:
> > On Wed, Jan 07, 2004 at 01:04:34AM -0600, John Van Essen wrote:
> >
> > The hardlinks have to be created in the receiver but the
> > logic of selecting which one is the target/head must happen
> > before the fork so the receiver can act on it. This means
> > that the data related to these must become fixed before the
> > fork. Doing that also avoids a COW.
>
> The COW comes into play only in the hlink_list, but let's drop
> this enhancement (hardlinking during file processing) for now.
> But... I would like to see it done if possible so the sorted
> verbose listing of transferred files has the hardlinked files
> properly sorted in the output instead of grouped at the end...
> (That bugs me about the symlinks, too, but that's neither here
> nor there...)
>
> > The list** reuses the qsorted hlink_list. The linked list
> > with head means we only need a list of heads. Pick one or
> > the other and don't modify the data after the fork.
> >
> > I'm inclined right now to go with the non-looping linked
> > list and a list of heads so that in the generator we
> >
> > if (file->links && file->links->head != file)
> > return;
> > /* test will be a little more complicated because
> > * file is actually a copy
> > */
> >
> > Then do_hard_links() would
> >
> > for (i = 0; i < hlink_count; i++) {
> > src = hlink_list[i];
> > while (dest = hlink_list[i]->links->next) {
> > do_a_link(src, dest);
> > }
> > }
>
> The point of this exercise was to find a way to avoid unnecessary
> transfers of already existing files, and what we've designed so
> far does that (and also gets rid of the binary search), so yes -
> let's go with this and revisit possible further enhancements later.
I thought the point was to reduce the memory footprint and
then get rid of the binary search.
The unnecessary transfer elimination is a happy potential
byproduct of pushing the hlink data to where the generator
can iterate over hlink sets.
The steps i see are:
- The hlink_list change to a pointer array (just committed)
- Create the union and change file_struct and the routines
that reference and populate it to use the union for dev
and inode. This may include not allocating the union for
unlinkable files.
- Overwrite the unions with the linked list stuff and change
the logic to use them. Also free the unions for unlinked
files.
(this is the biggest step)
- Reduce the hlink_list to just the heads and change
do_hard_links.
- consolidate the fnamecmp finder function for
recv_generator() in generator.c and recv_files() in
receiver.c
- Add the list walk for heads that don't exist yet.
Each of these is a discrete step that when complete the code
will function correctly.
> Feel free to start coding. ;-) Not that I'm lazy... <cough>
Oh! Sorry to hear that, I am. The only thing preventing me
from saying go ahead is my uncertainty whether we both have
the same design.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync
mailing list