Patch for rsync --link-dest won't link even if existing file is out of date (fwd)

Robert Bell Robert.Bell at csiro.au
Mon Apr 6 21:14:02 MDT 2015


Folks,
We faced a similar situation to that which Ken described - we recycle
backup directories, for good reason.

There is a patch to solve the problem.

Our systems administrator provided the following description of the
patches we use:

============================================================================
1. rsync_link_dest improvement

by Bryant Hansen

Normally, existing files in destination are never updated from link-dest
but are transferred over the wire. This patch changes that behaviour to
use link-dest instead, which is a major performance enhancement in our
environment.

2. Warnings for --max-size ignored files are displayed if -w/--warning
is specified

by Rowan McKenzie (CSIRO SC)

Warnings for -max-size ignored files are displayed if -w/-warning is
specified. Normally, -max-size causes files to be silently ignored!

3. Only output '=>' notifications when -v/--verbose specified

by Rowan McKenzie (CSIRO SC)

Only output '#' notifications when -v/-verbose specified (it's a patch
to the rsync_link_dest_from_bryant patch). This reduces clutter by
suppressing a large class of false positives.:
============================================================================


Hope you can find these.


(All we need now for rsync perfection for our backups is a solution to
the problem of metadata changes being propagated across all directories
for hard-linked files - we would rather new copies be made than lose the
old metadata.)


Regards

Rob.

Dr Robert C. Bell
HPC National Partnerships | Scientific Computing
Information Management and Technology
CSIRO
T +61 3 9669 8102 Alt +61 3 8601 3810 Mob +61 428 108 333
Robert.Bell at csiro.au<mailto:Robert.Bell at csiro.au> | www.csiro.au | wiki.csiro.au/display/ASC/
Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia
Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia

PLEASE NOTE
The information contained in this email may be confidential or privileged.
Any unauthorised use or disclosure is prohibited.  If you have received
this email in error, please delete it immediately and notify the sender by
return email. Thank you.  To the extent permitted by law, CSIRO does not
represent, warrant and/or guarantee that the integrity of this
communication has been maintained or that the communication is free of
errors, virus, interception or interference.

Please consider the environment before printing this email.

---------- Forwarded message ----------
Date: Mon, 6 Apr 2015 01:51:21 -0400
From: Ken Chase <rsync-list-m829 at sizone.org>
To: rsync at lists.samba.org
Subject: rsync --link-dest won't link even if existing file is out of date

Feature request: allow --link-dest dir to be linked to even if file exists
in target.

This statement from the man page is adhered to too strongly IMHO:

"This option works best when copying into an empty destination hierarchy, as
rsync treats existing files as definitive (so it never looks in the link-dest
dirs when a destination file already exists)".

I was suprised by this behaviour as generally the scheme is to be efficient/save
space with rsync.

When the file is out of date but exists in the --l-d target, it would be great
if it could be removed and linked. If an option was supplied to request this
behaviour, I'd actually throw some money at making it happen.  (And a further
option to retain a copy if inode permissions/ownership would otherwise be
changed.)

Reasoning:

I backup many servers with --link-dest that have filesystems of 10+M files on
them.  I do not delete old backups - which take 60min per tree or more just so
rsync can recreate them all in an empty target dir when <1% of files change
per day (takes 3-5 hrs per backup!).

Instead, I cycle them in with mv $olddate $today then rsync --del --link-dest
over them - takes 30-60 min depending. (Yes, some malleability of permissions
risk there, mostly interested in contents tho).  Problem is, if a file exists
AT ALL, even out of date, a new copy is put overtop of it per the above man
page decree.

Thus much more disk space is used. Running this scheme with moving old backups
to be written overtop of accumulates many copies of the exact same file over
time.  Running pax -rpl over the copies before rsyncing to them works (and
saves much space!), but takes a very long time as it traverses and compares 2
large backup trees thrashing the same device (in the order of 3-5x the rsync's
time, 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some non-linear
algorithm therein - it ran 3-5x slower than pax again).

I have detailed an example of this scenario at

http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists

which also indicates --delete-before and --whole-file do not help at all.

/kc
-- 
Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.



More information about the rsync mailing list