rsync --link-dest won't link even if existing file is out of date

Ken Chase rsync-list-m829 at sizone.org
Sun Apr 5 23:51:21 MDT 2015


Feature request: allow --link-dest dir to be linked to even if file exists
in target.

This statement from the man page is adhered to too strongly IMHO:

"This option works best when copying into an empty destination hierarchy, as
rsync treats existing files as definitive (so it never looks in the link-dest
dirs when a destination file already exists)".

I was suprised by this behaviour as generally the scheme is to be efficient/save
space with rsync.

When the file is out of date but exists in the --l-d target, it would be great
if it could be removed and linked. If an option was supplied to request this
behaviour, I'd actually throw some money at making it happen.  (And a further
option to retain a copy if inode permissions/ownership would otherwise be
changed.)

Reasoning:

I backup many servers with --link-dest that have filesystems of 10+M files on
them.  I do not delete old backups - which take 60min per tree or more just so
rsync can recreate them all in an empty target dir when <1% of files change
per day (takes 3-5 hrs per backup!). 

Instead, I cycle them in with mv $olddate $today then rsync --del --link-dest
over them - takes 30-60 min depending. (Yes, some malleability of permissions
risk there, mostly interested in contents tho).  Problem is, if a file exists
AT ALL, even out of date, a new copy is put overtop of it per the above man
page decree.

Thus much more disk space is used. Running this scheme with moving old backups
to be written overtop of accumulates many copies of the exact same file over
time.  Running pax -rpl over the copies before rsyncing to them works (and
saves much space!), but takes a very long time as it traverses and compares 2
large backup trees thrashing the same device (in the order of 3-5x the rsync's
time, 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some non-linear
algorithm therein - it ran 3-5x slower than pax again).

I have detailed an example of this scenario at

http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists

which also indicates --delete-before and --whole-file do not help at all.

/kc
-- 
Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.


More information about the rsync mailing list