TODO hardlink performance optimizations

jw schultz jw at pegasys.ws
Tue Jan 6 23:11:54 GMT 2004


On Tue, Jan 06, 2004 at 02:04:19AM -0600, John Van Essen wrote:
> On Mon, 5 Jan 2004, jw schultz <jw at pegasys.ws> wrote:
> > On Mon, Jan 05, 2004 at 02:46:52PM -0800, Wayne Davison wrote:
> [ snip ]
> >> This seems like a good way to go to me.
> > 
> > Your comments have promted a bit more thought and i'm
> > thinking we might eliminate check_hard_link().  If we walk the
> > hlink list after sorting we have enough info to push the
> > data back up to the file_list and eliminate searching
> > altogether.
> 
> Right.
> 
> [ snip ]
[snip]
> > union links {
> >         struct idev {
> >                 INO64_T inode;
> >                 DEV64_T dev;
> >         }
> >         struct hlinks {
> >                 int                count;
> >                 file_struct        **link_list;
> >         }
> > }
> > 
[snip]
> 
> Now that you have opened the door for re-using the inode and
> dev area in file_struct, I can make my initial suggestion that
> requires two 'additional' pointers in file_struct.  I didn't
> suggest it before because I didn't want to add to the struct.
> 
> I propose to reuse the DEV amd INODE areas to store two new
> file_struct pointers (the 2nd part of the union).  These would
> be set during the post-qsort phase during init_hard_links.
> 
> The first pointer links together the file_structs for each
> hardlink group in a list in the same order as in the qsort
> so they can be walked by your proposed link-dest method.
> 
> The second pointer points to the head (first file_struct of
> a hardlink group).
> 
> For each file_struct that is modified in this way a flag bit
> needs to be set indicating so.

No it doesn't.  After we walk the qsorted hlink_list all
file_structs->links pointers either point to a hlinks or
are NULL.  If there are no links file_structs->links is
NULL.

> Then, if the file_struct address equals its head pointer, then
> you are at a head of a hardlink list which can be processed
> using the new method using link-dest that you outlined earlier.

So you are proposing a singly linked list:
         struct hlinks {
                 file_struct		*head;
                 file_struct		*next;
         }

That would work too.  With the pointers into the array
you compare the first in the array with yourself instead of
head.  I preferred knowing in advance how many links there
were.

> I'm still not clear on what exactly will happen during that new
> processing method.  I assume that if the head file does not exist,
> it will find any existant file and hardlink it to the head file,
> and if not found, will transfer the head file.  Is that correct?

Not exactly.  Like my earlier post if the head (your term)
doesn't exist (existence including compare_dest) we iterate
over the link set and the first that does exist is used for
fnamecmp.  But that is phase two or three.

> Processing of non-head files (file_struct address not equal to
> head pointer) can be skipped since they are hardlinked later.
> 
> The final walk-through that creates the hardlinks for the
> non-head files can walk the qsorted list and use the head
> pointer as the target for the hardlinks of the non-nead files.

Actually, if we could do the hardlinks of non-head files as
we encounter them while walking the file list and used your
singly linked lists we could free the hlink_list after
traversal.  The problem is that we would need to be sure the
head had completed processing so the hardlink would have to
be created in receiver and that gets ugly.  One possibility
would be to keep the trailing walk-through but reduce the
hlink_list to an array of heads.

> No extra memory required beyond that already required for the
> qsorted pointer list.
> 
> No binary search required during the file processing phase.
> 
> The key is getting the link-dest method working for this.

Huh?  This doesn't have any adverse affect on link-dest.
It only has a potential positive affect on
(link|compare)_dest.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt


More information about the rsync mailing list