Feature Request: break hardlinks before metadata changes

jw schultz jw at pegasys.ws
Thu Oct 24 09:32:00 EST 2002

On Thu, Oct 24, 2002 at 04:40:48AM -0400, Tripp Lilley wrote:
> >From TODO:
>   We can also have the case where there are links to a file that are
>   not in the tree being transferred.  There's nothing we can do about
>   that.  Because we rename the destination into place after writing,
>   any hardlinks to the old file are always going to be orphaned.  In
>   fact that is almost necessary because otherwise we'd get really
>   confused if we were generating checksums for one name of a file and
>   modifying another.
> >From Mike Rubel's excellent incremental backups HOWTO:
> <http://www.mikerubel.org/computers/rsync_snapshots/#Bugs>
>   As-written, the snapshot system above does not properly maintain old
>   ownerships/ permissions; if a file's ownership or permissions are
>   changed in place, then the new ownership/permissions will apply to older
>   snapshots as well.  This is because rsync does not unlink files prior to
>   changing them if the only changes are ownership/permission.  Thanks to
>   J.W. Schultz for pointing this out.  This is not a problem for me, but
>   slightly more complicated workarounds are possible
> I'd personally like to see an option to force rsync to break-and-copy any
> hardlink that pointed outside of the destination tree before doing -any-
> changes, even metadata.
> I know that this "breaks" standard hardlink semantics, but it's a
> desirable breakage for building these nice incremental backup systems :)
> I'm willing to eat the space taken by duplicating the entire file, since
> the obvious alternative (using LVM snapshots to preserve -everything-
> about the previous versions) has its own critical drawback (the danger of
> running out of space on the snapshot limits how long a given snapshot can
> stick around on the system).

Don't forget the performance penalty of LVM snapshots.
Every changed block has to be copied.

> It looks like I should make the change in generator.c : recv_generator,
> but I'm not quite sure of the repercussions of "copying" the various sorts
> of files the link target might be. Actually, I guess I'm unfamiliar enough
> with hardlinks to not even know what I can hardlink to :)

You might be able to piggy-back on --link-dest.

I dealt with this issue in the --link-dest patch by causing
skip_file() to treat files with meta-data change as though
having content change.  This does result in a bit more
network load and breaking the links will cause the snapshot
images to inflate if someone does a chmod|chown|chgrp -R
but i consider the issue of changing earlier images
to be an overriding concern.

There aren't too many circumstances where this is going to
be an issue outside of linked backup images.  If that is
what you are doing, take a look at using --link-dest or even
try dirvish (http://www.pegasys.ws/dirvish) which uses
--link-dest to get this right.

It has been a while since i seriously looked at this
particular bit of code and i'm not sure i like the idea of
adding this feature but you could try a modification in
generator.c something like:

  /* choose whether to skip a particular file */
  static int skip_file(char *fname,
                     struct file_struct *file, STRUCT_STAT *st)
        if (st->st_size != file->length) {
                return 0;
-       if (link_dest) {
+       if (link_dest || preserve_outer_links) {
                if((st->st_mode & ~_S_IFMT) !=  (file->mode & ~_S_IFMT)) {
                        return 0;
                if (st->st_uid != file->uid || st->st_gid != file->gid) {
                        return 0;

Where preserve_outer_links (example name) is a boolean set
from the command-line.

There of course is no distinction made here whether the link
has any references outside of the synchronized tree.
Identifying if there were a link outside the tree would
require deferring all meta-data changes until all scanning
had been completed and that would constitute a significant
change from present code where set_perms() is called from
recv_generator() if a file hasn't changed.

	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt

More information about the rsync mailing list