rsync --link-dest won't link even if existing file is out of date

Ken Chase rsync-list-m829 at sizone.org
Mon Apr 6 10:12:05 MDT 2015


This has been a consideration. But it pains me that a tiny change/addition
to the rsync option set would save much time and space for other legit use
cases.

We know rsync very well, we dont know ZFS very well (licensing kept the
tech out of our linux-centric operations). We've been using it but we're
not experts yet.

Thanks for the suggestion.

/kc

On Mon, Apr 06, 2015 at 12:07:05PM -0400, Kevin Korb said:
  >-----BEGIN PGP SIGNED MESSAGE-----
  >Hash: SHA1
  >
  >Since you are in an environment with millions of files I highly
  >recommend that you move to ZFS storage and use ZFS's subvolume
  >snapshots instead of --link-dest.  It is much more space efficient,
  >rsync run time efficient, and the old backups can be deleted in
  >seconds.  Rsync doesn't have to understand anything about ZFS.  You
  >just rsync to the same directory every time and have ZFS do a snapshot
  >on that directory between runs.
  >
  >On 04/06/2015 01:51 AM, Ken Chase wrote:
  >> Feature request: allow --link-dest dir to be linked to even if file
  >> exists in target.
  >> 
  >> This statement from the man page is adhered to too strongly IMHO:
  >> 
  >> "This option works best when copying into an empty destination
  >> hierarchy, as rsync treats existing files as definitive (so it
  >> never looks in the link-dest dirs when a destination file already
  >> exists)".
  >> 
  >> I was suprised by this behaviour as generally the scheme is to be
  >> efficient/save space with rsync.
  >> 
  >> When the file is out of date but exists in the --l-d target, it
  >> would be great if it could be removed and linked. If an option was
  >> supplied to request this behaviour, I'd actually throw some money
  >> at making it happen.  (And a further option to retain a copy if
  >> inode permissions/ownership would otherwise be changed.)
  >> 
  >> Reasoning:
  >> 
  >> I backup many servers with --link-dest that have filesystems of
  >> 10+M files on them.  I do not delete old backups - which take 60min
  >> per tree or more just so rsync can recreate them all in an empty
  >> target dir when <1% of files change per day (takes 3-5 hrs per
  >> backup!).
  >> 
  >> Instead, I cycle them in with mv $olddate $today then rsync --del
  >> --link-dest over them - takes 30-60 min depending. (Yes, some
  >> malleability of permissions risk there, mostly interested in
  >> contents tho).  Problem is, if a file exists AT ALL, even out of
  >> date, a new copy is put overtop of it per the above man page
  >> decree.
  >> 
  >> Thus much more disk space is used. Running this scheme with moving
  >> old backups to be written overtop of accumulates many copies of the
  >> exact same file over time.  Running pax -rpl over the copies before
  >> rsyncing to them works (and saves much space!), but takes a very
  >> long time as it traverses and compares 2 large backup trees
  >> thrashing the same device (in the order of 3-5x the rsync's time,
  >> 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some
  >> non-linear algorithm therein - it ran 3-5x slower than pax again).
  >> 
  >> I have detailed an example of this scenario at
  >> 
  >> http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists
  >>
  >>  which also indicates --delete-before and --whole-file do not help
  >> at all.
  >> 
  >> /kc
  >> 
  >
  >- -- 
  >~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
  >	Kevin Korb			Phone:    (407) 252-6853
  >	Systems Administrator		Internet:
  >	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
  >	Orlando, Florida		kmk at sanitarium.net (personal)
  >	Web page:			http://www.sanitarium.net/
  >	PGP public key available on web site.
  >~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
  >-----BEGIN PGP SIGNATURE-----
  >Version: GnuPG v2
  >
  >iEYEARECAAYFAlUirykACgkQVKC1jlbQAQc83ACfa7lawkyPFyO9kDE/D8aztql0
  >AkAAoIQ970yTCHB1ypScQ8ILIQR6zphl
  >=ktEg
  >-----END PGP SIGNATURE-----
  >-- 
  >Please use reply-all for most replies to avoid omitting the mailing list.
  >To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
  >Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

-- 
Ken Chase - ken att heavycomputing.ca Toronto Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.


More information about the rsync mailing list