TODO hardlink performance optimizations

jw schultz jw at pegasys.ws
Thu Jan 8 00:15:57 GMT 2004


On Wed, Jan 07, 2004 at 03:25:39PM -0800, Wayne Davison wrote:
> On Wed, Jan 07, 2004 at 01:30:19AM -0800, jw schultz wrote:
> > On Wed, Jan 07, 2004 at 02:45:46AM -0600, John Van Essen wrote:
> > > The point of this exercise was to find a way to avoid unnecessary
> > > transfers of already existing files
> > I thought the point was to reduce the memory footprint and
> > then get rid of the binary search.
> 
> They are both desireable goals, and I'd like to see one other:  a
> reduction in number of bytes transmitted when sending hard-link data.
> If we omit the dev/inode data for items that can't be linked together,
> we should be able to save a large amount of transmission size (but

That would also require increasing the size of flags so the
savings of 8-16 bytes would be offset somewhat by a 1 byte
increase.  Most likely use 2 bits (SAME_DEV and HAVE_INODE).
That would give us 6 bits for future expansion.

I'd also want to send for all !IS_DIR and not just IS_REG.
Otherwise fixing the failure to preserve links on symlinks,
device, fifos and sockets would need yet another protocol
bump.

> this will require a protocol bump).  Of course this does not mean that
> the new optimized hard-link code would require this optimized sending
> in order to work.
> 
> > - Create the union and change file_struct and the routines
> >   that reference and populate it to use the union for dev
> >   and inode.  This may include not allocating the union for
> >   unlinkable files.
> 
> I had been considering possible ways to avoid having the extra pointer
> in the flist_struct, and a suggestion John made has made me think that
> we can leave it out if we allow the file_struct to be of variable
> length.  We'd set a flag if it has the extra trailing data, and never
> refer to this data if the flag is not set.

Runtime variable sized structures should be avoided.  Do you
want to make rdev, link and sum conditional also?  We are
replacing two u64 with one pointer that will often be NULL,
that should be enough.

If you wanted i suppose you could make rdev, link and sum a
union within file_struct since they are mutually exclusive
and dependent on IS_*(mode).  That would squeeze another 8
bytes/file with a minimal impact on the code.

> > - Reduce the hlink_list to just the heads and change
> >   do_hard_links.
> 
> I'm not sure this is worth the cost of copying the bytes, but we'll
> see.

The cache lines are hot, it will free usable amounts of
memory and it will simplify subsequent logic without
complicating the code that walks the hlink_list.

> 
> > Each of these is a discrete step that when complete the code
> > will function correctly.
> 
> Yes.  Nice plan.  If either of you have started coding the next stuff,
> let me know -- I'm thinking about doing some coding.

I've not started coding beyond what i've already committed.
John seemed eager to start work on this but i'm not sure of
his status.  Having gotten the design hammered out he seemed
to wish to take implimentation details off-list, i'm sure
he'll be glad to CC you.

The transmission reduction above is largely independant of
the other code.

Q for lurkers:  What is the value of dev and inode on
systems that don't have them?  0 or -1?

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt


More information about the rsync mailing list