rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)B
Robert Bell
Robert.Bell at csiro.au
Mon Jan 21 20:35:55 MST 2013
Paul Wayne, Kevin, Teodor and others,
Thanks for your contributions in response to my postings.
Paul: I was very imprecise if not plain wrong in my description. :-(
Thanks for explaining what really happens.
> "Rsync will not update an existing file in-place unless you use the
> --inplace option. So --whole-file is irrelevant for this.
> Rsync (without --inplace) will always create a new (temporary) file,
> using the existing data (without --whole-file) to enable the delta diff
> speedup algorithm. Once the temp file is successfully created, it's
> renamed to the original name, deleting the existing link. So any
> hardlinked data will remain untouched."
Since my posting to the rsync digest last week, I've needed to think a
lot about rsync behaviour with hard-links, and have been doing some
tests.
It's been good to have an outbreak of postings on these issues, with a
re-visiting of Bug 5644, and Wayne's postings about features in the
upcoming version 3.1.0. (At our site, we use a patched version of rsync
which links a file from the link-dest directory rather copying from
source when a file is identical in the source and link-dest directory,
but exists and is different in the destination.)
I was not aware of the issue in the case where the unchanged_file() test
is passed, but not the unchanged_attrs() test, and the potential for
over-writing the attributes in not just the destination, but for all
hard-linked files.
This means that recycling directories, which as Teodor Milkov noted:
> "Such a behaviour (unlink changed files and then hard link to dest dir)
> would be very handy, because rotating large directory trees (e.g. 10
> milion files, 10k files changed) is sooo much more efficient than
> deleting them and then repopulating from scratch."
is an issue as Wayne noted:
> "A pre-existing hard-linked copy of the files causes rsync to
> just change the attributes on the file in-place (without breaking the
> hard-link). This can be a minor point for some people (if historical
> permissions/ACLs/xattrs don't need to be accurate), but could be a deal
> breaker for some."
I can see the need for another rsync option here to allow users to
select the making of a fresh copy of the file in this case. That would
restore the behaviour I implicitly assumed we had, but didn't.
I've updated the documentation for our backups, and prepared a note for
users. I'm also thinking about ways around this issue, none of which
are particularly appealing:
- drop the recycling of old directories (parameterised in our set-up)
- break the linking at regular intervals (parameterised in our set-up)
- do a dry run to identify changed files, delete those on the
destination, and then do a non-dry run (there are timing issues here,
but there always will be for a non-quiet filesystem).
Thanks again
Regards
Rob. Bell e-mail: Robert.Bell at csiro.au
--
Dr Robert C. Bell, BSc (Hons) PhD
Technical Services Manager
Advanced Scientific Computing
CSIRO IM&T
Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810
Robert.Bell at csiro.au | http://www.csiro.au/ | http://www.hpsc.csiro.au/
Addresses:
Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia
Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia
Please see earlier postings for the disclaimer.
More information about the rsync
mailing list