TODO hardlink performance optimizations
jw schultz
jw at pegasys.ws
Wed Dec 17 23:51:13 EST 2003
On Wed, Dec 17, 2003 at 03:43:40AM -0800, cbarratt at users.sourceforge.net wrote:
> jw schultz writes:
>
> > On Tue, Dec 16, 2003 at 03:18:15PM -0600, John Van Essen wrote:
> > > On Mon, 15 Dec 2003, jw schultz <jw at pegasys.ws> wrote:
> > > > Hard-link handling
> > > >
> > > > At the moment hardlink handling is very expensive, so it's off by
> > > > default. It does not need to be so.
> > > >
> > > > Since most of the solutions are rather intertwined with the file
> > > > list it is probably better to fix that first, although fixing
> > > > hardlinks is possibly simpler.
> > > >
> > > > We can rule out hardlinked directories since they will probably
> > > > screw us up in all kinds of ways. They simply should not be used.
> > > >
> > > > At the moment rsync only cares about hardlinks to regular files. I
> > > > guess you could also use them for sockets, devices and other beasts,
> > > > but I have not seen them.
> > > >
> > > > When trying to reproduce hard links, we only need to worry about
> > > > files that have more than one name (nlinks>1 && !S_ISDIR).
> > >
> > > It would be very helpful if file_struct.flags could have a bit set to
> > > indicate that the node count was greater than 1. This info could be
> > > used later to optimize the hardlink search by only considering those
> > > flist entries with this flag bit set.
> > >
> > > It'd be nice to implement this bit setting in this protocol number so
> > > it can be widely distributed before 2.6.1 is released which could have
> > > the code to actually make use of it. I'd be interested in doing the
> > > later changes, but if Martin or jw could at least get the bit set...
> > > It doesn't even have to be --hwlink option dependent. Just examine
> > > the node count and set the bit.
> >
> > I'm not keen on squeezing that in at this time. Lets get it
> > out the door, hardlink performance improvements can be made
> > in a minor release. I'm also a bit more inclined to pass
> > nlinks (IFF non-zero and ~IS_DIR).
>
> The nlinks > 1 optimization would be a good one to add, but after
> the next release.
>
> For hardlinks it would be great to only send the device and inode
> information in the file list IFF nlinks > 1 and ~IS_DIR. Currently
> --hard-links sends device and inode for every file. This causes a
> lot of unnecessary data to be sent, and also means the receiver has
> to store and search inode information for every file, rather than
> just candidate hardlinks.
>
> Unfortunately all the bits in the flag byte in the file list are used,
> so we need to figure out some other way to indicate which files include
> (dev,inode) data.
>
> The goal would be to make --hard-links have little impact on network
> traffic, memory and speed.
>
> However, this would require a protocol bump.
It might be time to increase the size of flags to 16 bits;
protocol dependant of course. We'll need more bits anyway
if we ever add ACLs and EAs. We'd want to use at least two
bits (rough approximation) in send_file_entry():
if (protocol_version >= 28 && !S_ISDIR(file->mode)
&& preserve_hard_links && file->st_nlink > 1)
{
flags |= HARD_LINKED;
if (file->st_dev == last_dev)
flags |= SAME_DEV;
}
...
if (protocol_version < 28 && preserve_hard_links && S_ISREG(file->mode)) {
if (protocol_version < 26) {
/* 32-bit dev_t and ino_t */
write_int(f, (int) file->dev);
write_int(f, (int) file->inode);
} else {
/* 64-bit dev_t and ino_t */
write_longint(f, file->dev);
write_longint(f, file->inode);
}
}
else if (flags & HARD_LINKED)
{
if (!(flags & SAME_DEV))
write_longint(f, file->dev);
write_longint(f, file->inode);
}
and in recv_file_entry (even rougher):
if (flags & HARD_LINKED)
{
if (!(flags & SAME_DEV))
last_dev = read_longint(f);
file->dev = last_dev;
file->inode = read_longint(f);
} else {
file->dev = file->inode = 0;
}
Using 0 for inode to indicate no hardlinks also fixes the
problem of erroneously trying to preserve links from
filesystems that not supporting inode numbers report 0 for
all inodes as was recently reported.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync
mailing list