Backup scripts - recycling old backup directories (Kevin Korb)

Robert Bell Robert.Bell at csiro.au
Mon Sep 15 00:24:11 MDT 2014


Kevin,

Thanks for the reply and interest in this topic.

Comments below.

Regards

Rob.

> I did consider that but rejected it for 2 reasons...
> 
> 1. Backup run time.  We have a 4 hour window to run backups at night.
>  Using recycled directories significantly extended the backup run
> time.  The deletion time is eliminated but frankly, we have the other
> 20 hours of the day to do deletions.  We had to give up using
> - --link-dest when the deletions started to actually take that long even
> though the backups still ran in under 4 hours.

For us, the recycling of old directories significantly shortened the time
to do backups, since the recycled backups have typically 95% of the
files/directories correct (with daily backups and Tower of Hanoi, 
half of our recycled backups are only 5 to 6 days old).

I've just done some tests with a fairly pathological case, all on one
host.

I set up a source tree 's' with 11111 sub-directories and 10000 files,
and then two destinations:
   cp -a s d1
   cp -afl d1 d2

I then did the first test:
   # rsync to a new directory, followed by a remove of an old directory."
   time rsync -a --link-dest=../d2 s/ d3
   time /bin/rm -rf d1


I then scrubbed the lot, set it up again, and did the second test:
   mv d1 d3
   # rsync to a recycled directory"
   time rsync -a --link-dest=../d2 --delete s/ d3

I hope I got this right!  I've made no effort to circumvent caching.

Anyway, here is a table of the average times (seconds) over 5 runs of each test.

 	Real	User	Sys	(User+Sys)
test 1	2.454s	0.150s	2.196s	2.346s
test 2	0.392s	0.100s	0.572s	0.672s
ratio	  6.3	  1.5	  3.8	  3.5

(The User+Sys time is pretty much invariant, even though in earlier tests
the real time suffered major blowouts owing to contention.)

So, the big difference is that in test 1, the 11111 sub-directories and
10000 files were created in the destination d3, and then the same
numbers were deleted from the old directory d1.  In test 2, rsync does
none of that, but only has to check for differences.  ~40,000 metadata
operations avoided on the filesystem in this case.


> 
> 2. Metadata history.  If there is an existing file in the target dir
> that differs only by metadata (permissions, ownership, timestamp) then
> rsync will simply change that metadata.  That change affects all
> instances of that file.  Of course this is better for storage space as
> the alternative is storing another copy of the file with the different
> metadata but we decided it was better to have that information saved.
Yes.


I would love to see someone make a patched version of rsync to allow
callers to select a different behaviour in this case!

So, if a file has identical content on source and destination but
different metadata, then if --link-dest is in use and the link count on
the destination is > 1, then take a new copy from source rather than
just updating the metadata (the file could be copied on the destination
and then the copy updated with the new metadata and the old version
removed, but this would not be essential - just perhaps an efficiency
gain.)

Thanks in anticipation!


Dr Robert C. Bell
HPC National Partnerships | Scientific Computing
Information Management and Technology
CSIRO
T +61 3 9669 8102 Alt +61 3 8601 3810 Mob +61 428 108 333
Robert.Bell at csiro.au<mailto:Robert.Bell at csiro.au> | www.csiro.au | wiki.csiro.au/display/ASC/
Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia
Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia

PLEASE NOTE
The information contained in this email may be confidential or privileged.
Any unauthorised use or disclosure is prohibited.  If you have received
this email in error, please delete it immediately and notify the sender by
return email. Thank you.  To the extent permitted by law, CSIRO does not
represent, warrant and/or guarantee that the integrity of this
communication has been maintained or that the communication is free of
errors, virus, interception or interference.

Please consider the environment before printing this email.


More information about the rsync mailing list