[patch] link-dest messages and max-size warnings (fwd)

Robert Bell Robert.Bell at csiro.au
Thu Aug 23 19:29:26 MDT 2012


rsync Folks,


The following explanatory text is by me and the patches are by Rowan
McKenzie for use by the Advanced Scientific Computing group at CSIRO.

This patch builds upon the --link-dest patch by Bryant Hansen (Thanks heaps!).

1. The original patch provided an alternate behaviour for rsync when
using the --link-dest option.

When there are identical files in the source and link-dest areas, but
a different file in the destination area, standard rsync will update
the destination by copying the file from source to destination: the
patch updates the destination by hard-linking from the link-dest area.

Why do we want this behaviour?

In our backup system which we have been running since 2007 using
rsync, we recycle old backup destinations to be the target for new
backups.  This is an efficiency gain - we don't want to create a new
area with, in several cases, millions of files, when we have an older
area which is nearly up to date.  (With mature backups of user areas,
we typically find that our daily backups have a churn of 0.5% of files
and 1% of data.)

The original patch ensures we get the maximum amount of hard-linking
available.

However, the original patch unconditionally outputs messages for every
file hard-linked under the scenario outlined above.  Our modified
patch makes output of the diagnostic message controlled by the -v option.


2. In addition, the patch adds one more feature and option.  In our
backups some time ago, we wished to avoid repeated daily backups of
large files that were being appended to each day - they were the
outputs of computational models.  We used the --max-size parameter to
skip these files: however, we did not like the lack of warning about
skipped files.  This patch adds another parameter, --warn, to select
the output of a warning message when files bigger than the selected
--max-size are skipped.  The message is of the form:
big_file is over max-size


3. For information: our backups are controlled from the destination of
the backups (pull rather than push as Kevin Korb recently advised).
We use the rsync daemon capability.

The destinations of our backups are file systems subject to HSM
(Hierarchical Storage Management), using SGI's Data Migration
Facility (DMF).

A typical command we use is the following (but I have shortened the
paths and addresses).


rsync --password-file=not_for_your_eyes --numeric-ids -a --stats
--one-file-system --max-size=8.0GB  --warn --whole-file
--link-dest=previous --delete root at source_host::backups/source_dir current

--password-file=not_for_your_eyes
   	. for the daemon
--numeric-ids
   	. since the userids on the source are not always available on the destination
-a
   	. archive mode
--stats
   	. statistics
--one-file-system
   	. stops the backup of everything when backing up /
--max-size=8.0GB
   	. to skip large files
--warn
   	. NEW parameter - warn of skipped files because of --max-size
--whole-file
   	. essential when the destination is subject to HSM: otherwise,
   	  files will be recalled to use the rsync comparison algorithm
--link-dest=previous
   	. pointer to previous backup: to provide a source of files for
   	  hard-linking
--delete
   	. essential when the destination is a recycled directory, to
ensure superseded files are deleted
root at source_host::backups/source_dir
   	. the source specification: username @source_host, module
specification, and source directory
current
   	. the destination directory.

We use an extended Tower of Hanoi scheme to manage the keeping of backups:
- highly recommended for its ability to provide sensible keeping of
     backups matched to the likelihood of restores, and because it avoids
     messy management using dates and times.



Regards
Rob. Bell              e-mail: Robert.Bell at csiro.au
--
Dr Robert C. Bell, BSc (Hons) PhD
Technical Services Manager
Advanced Scientific Computing
CSIRO IM&T

Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810
Robert.Bell at csiro.au | http://www.csiro.au/ | http://www.hpsc.csiro.au/
Addresses:
Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia
Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rsync_link_dest_max_size_CSIRO-ASC.patch
Type: text/x-diff
Size: 4060 bytes
Desc: rsync_link_dest_max_size_CSIRO-ASC.patch
URL: <http://lists.samba.org/pipermail/rsync/attachments/20120824/8af71eed/attachment.patch>


More information about the rsync mailing list