rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)B

Robert Bell Robert.Bell at csiro.au
Mon Jan 21 20:35:55 MST 2013


Paul Wayne, Kevin, Teodor and others,
Thanks for your contributions in response to my postings.

Paul: I was very imprecise if not plain wrong in my description.  :-(
Thanks for explaining what really happens.

> "Rsync will not update an existing file in-place unless you use the
>  --inplace option. So --whole-file is irrelevant for this.
>  Rsync (without --inplace) will always create a new (temporary) file,
>  using the existing data (without --whole-file) to enable the delta diff
>  speedup algorithm. Once the temp file is successfully created, it's
>  renamed to the original name, deleting the existing link. So any
>  hardlinked data will remain untouched."


Since my posting to the rsync digest last week, I've needed to think a
lot about rsync behaviour with hard-links, and have been doing some
tests.

It's been good to have an outbreak of postings on these issues, with a
re-visiting of Bug 5644, and Wayne's postings about features in the
upcoming version 3.1.0.  (At our site, we use a patched version of rsync
which links a file from the link-dest directory rather copying from
source when a file is identical in the source and link-dest directory,
but exists and is different in the destination.)

I was not aware of the issue in the case where the unchanged_file() test
is passed, but not the unchanged_attrs() test, and the potential for
over-writing the attributes in not just the destination, but for all
hard-linked files.

This means that recycling directories, which as Teodor Milkov noted:

>  "Such a behaviour (unlink changed files and then hard link to dest dir)
>  would be very handy, because rotating large directory trees (e.g. 10
>  milion files, 10k files changed) is sooo much more efficient than
>  deleting them and then repopulating from scratch."

is an issue as Wayne noted:

>  "A pre-existing hard-linked copy of the files causes rsync to
>  just change the attributes on the file in-place (without breaking the
>  hard-link).  This can be a minor point for some people (if historical
>  permissions/ACLs/xattrs don't need to be accurate), but could be a deal
>  breaker for some."

I can see the need for another rsync option here to allow users to
select the making of a fresh copy of the file in this case.  That would
restore the behaviour I implicitly assumed we had, but didn't.

I've updated the documentation for our backups, and prepared a note for
users.  I'm also thinking about ways around this issue, none of which
are particularly appealing:
  - drop the recycling of old directories (parameterised in our set-up)
  - break the linking at regular intervals (parameterised in our set-up)
  - do a dry run to identify changed files, delete those on the
    destination, and then do a non-dry run (there are timing issues here,
    but there always will be for a non-quiet filesystem).

Thanks again

Regards
Rob. Bell              e-mail: Robert.Bell at csiro.au
--
Dr Robert C. Bell, BSc (Hons) PhD
Technical Services Manager
Advanced Scientific Computing
CSIRO IM&T

Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810
Robert.Bell at csiro.au | http://www.csiro.au/ | http://www.hpsc.csiro.au/
Addresses:
Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia
Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia

Please see earlier postings for the disclaimer.



More information about the rsync mailing list