rsync --link-dest won't link even if existing file is out of date

Kevin Korb kmk at sanitarium.net
Mon Apr 6 10:07:05 MDT 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Since you are in an environment with millions of files I highly
recommend that you move to ZFS storage and use ZFS's subvolume
snapshots instead of --link-dest.  It is much more space efficient,
rsync run time efficient, and the old backups can be deleted in
seconds.  Rsync doesn't have to understand anything about ZFS.  You
just rsync to the same directory every time and have ZFS do a snapshot
on that directory between runs.

On 04/06/2015 01:51 AM, Ken Chase wrote:
> Feature request: allow --link-dest dir to be linked to even if file
> exists in target.
> 
> This statement from the man page is adhered to too strongly IMHO:
> 
> "This option works best when copying into an empty destination
> hierarchy, as rsync treats existing files as definitive (so it
> never looks in the link-dest dirs when a destination file already
> exists)".
> 
> I was suprised by this behaviour as generally the scheme is to be
> efficient/save space with rsync.
> 
> When the file is out of date but exists in the --l-d target, it
> would be great if it could be removed and linked. If an option was
> supplied to request this behaviour, I'd actually throw some money
> at making it happen.  (And a further option to retain a copy if
> inode permissions/ownership would otherwise be changed.)
> 
> Reasoning:
> 
> I backup many servers with --link-dest that have filesystems of
> 10+M files on them.  I do not delete old backups - which take 60min
> per tree or more just so rsync can recreate them all in an empty
> target dir when <1% of files change per day (takes 3-5 hrs per
> backup!).
> 
> Instead, I cycle them in with mv $olddate $today then rsync --del
> --link-dest over them - takes 30-60 min depending. (Yes, some
> malleability of permissions risk there, mostly interested in
> contents tho).  Problem is, if a file exists AT ALL, even out of
> date, a new copy is put overtop of it per the above man page
> decree.
> 
> Thus much more disk space is used. Running this scheme with moving
> old backups to be written overtop of accumulates many copies of the
> exact same file over time.  Running pax -rpl over the copies before
> rsyncing to them works (and saves much space!), but takes a very
> long time as it traverses and compares 2 large backup trees
> thrashing the same device (in the order of 3-5x the rsync's time,
> 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some
> non-linear algorithm therein - it ran 3-5x slower than pax again).
> 
> I have detailed an example of this scenario at
> 
> http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists
>
>  which also indicates --delete-before and --whole-file do not help
> at all.
> 
> /kc
> 

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
	Kevin Korb			Phone:    (407) 252-6853
	Systems Administrator		Internet:
	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
	Orlando, Florida		kmk at sanitarium.net (personal)
	Web page:			http://www.sanitarium.net/
	PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlUirykACgkQVKC1jlbQAQc83ACfa7lawkyPFyO9kDE/D8aztql0
AkAAoIQ970yTCHB1ypScQ8ILIQR6zphl
=ktEg
-----END PGP SIGNATURE-----


More information about the rsync mailing list