TODO hardlink performance optimizations

John Van Essen vanes002 at umn.edu
Wed Jan 7 08:45:46 GMT 2004


On Wed, 7 Jan 2004 00:03:13 -0800, jw schultz <jw at pegasys.ws> wrote:
> On Wed, Jan 07, 2004 at 01:04:34AM -0600, John Van Essen wrote:
>>
>> Right.  You've just explained the here-to-fore unknown reason
>> why that sibling hlinking was being done in a separate, final
>> phase.  If we keep it that way, I'd like to see a comment added
>> to explain what you just explained.
>> 
>> But here's an idea, still using my proposed head/next struct...
>> 
>> - Make *next a circular list (the last points to the first).
>> - Leave *head NULL initially.
>> - When a file of a hlink group is examined, and *head is NULL,
>>   then it is the first one of that group to be processed, and
>>   it becomes the head.  All the *head pointers are set to its
>>   file_struct address.
>> - Subsequent processing of siblings can immediately hardlink
>>   them to the head file.
>> 
>> The drawback is that this will invoke CopyOnWrite (as dicussed
>> earlier in this thread).  To avoid that, *head would have to
>> point outside to a group head pointer, which would then be set.
>> So you'd need an array of pointers of the same size as the
>> number of distinct hardlink groups.  Say!  We already have the
>> sorted hlink_list.  We could just point to the first element
>> of the group (setting it to NULL initially) after creating
>> the circularly linked list.
> 
> The hardlinks have to be created in the receiver but the
> logic of selecting which one is the target/head must happen
> before the fork so the receiver can act on it.  This means
> that the data related to these must become fixed before the
> fork.  Doing that also avoids a COW.

The COW comes into play only in the hlink_list, but let's drop
this enhancement (hardlinking during file processing) for now.
But...  I would like to see it done if possible so the sorted
verbose listing of transferred files has the hardlinked files
properly sorted in the output instead of grouped at the end...
(That bugs me about the symlinks, too, but that's neither here
nor there...)
 
> The list** reuses the qsorted hlink_list.  The linked list
> with head means we only need a list of heads.  Pick one or
> the other and don't modify the data after the fork.
> 
> I'm inclined right now to go with the non-looping linked
> list and a list of heads so that in the generator we
> 
>         if (file->links && file->links->head != file)
>                 return;
>         /* test will be a little more complicated because
>          * file is actually a copy
>          */
> 
> Then do_hard_links() would
> 
>         for (i = 0; i < hlink_count; i++) {
>                 src = hlink_list[i];
>                 while (dest = hlink_list[i]->links->next) {
>                         do_a_link(src, dest);
>                 }
>         }

The point of this exercise was to find a way to avoid unnecessary
transfers of already existing files, and what we've designed so
far does that (and also gets rid of the binary search), so yes -
let's go with this and revisit possible further enhancements later.

Feel free to start coding.  ;-)  Not that I'm lazy...  <cough>
-- 
        John Van Essen  Univ of MN Alumnus  <vanes002 at umn.edu>



More information about the rsync mailing list