Backup scripts - recycling old backup directories (Kevin Korb)
kmk at sanitarium.net
Mon Sep 15 09:03:34 MDT 2014
-----BEGIN PGP SIGNED MESSAGE-----
I would never operate in a manner that only has 5-6 days of old
backups. The backups that I am deleting are more than a year old.
On 09/15/2014 02:24 AM, Robert Bell wrote:
> Thanks for the reply and interest in this topic.
> Comments below.
>> I did consider that but rejected it for 2 reasons...
>> 1. Backup run time. We have a 4 hour window to run backups at
>> night. Using recycled directories significantly extended the
>> backup run time. The deletion time is eliminated but frankly, we
>> have the other 20 hours of the day to do deletions. We had to
>> give up using - --link-dest when the deletions started to
>> actually take that long even though the backups still ran in
>> under 4 hours.
> For us, the recycling of old directories significantly shortened
> the time to do backups, since the recycled backups have typically
> 95% of the files/directories correct (with daily backups and Tower
> of Hanoi, half of our recycled backups are only 5 to 6 days old).
> I've just done some tests with a fairly pathological case, all on
> one host.
> I set up a source tree 's' with 11111 sub-directories and 10000
> files, and then two destinations: cp -a s d1 cp -afl d1 d2
> I then did the first test: # rsync to a new directory, followed by
> a remove of an old directory." time rsync -a --link-dest=../d2 s/
> d3 time /bin/rm -rf d1
> I then scrubbed the lot, set it up again, and did the second test:
> mv d1 d3 # rsync to a recycled directory" time rsync -a
> --link-dest=../d2 --delete s/ d3
> I hope I got this right! I've made no effort to circumvent
> Anyway, here is a table of the average times (seconds) over 5 runs
> of each test.
> Real User Sys (User+Sys) test 1 2.454s 0.150s
> 2.196s 2.346s test 2 0.392s 0.100s 0.572s 0.672s
> ratio 6.3 1.5 3.8 3.5
> (The User+Sys time is pretty much invariant, even though in earlier
> tests the real time suffered major blowouts owing to contention.)
> So, the big difference is that in test 1, the 11111 sub-directories
> and 10000 files were created in the destination d3, and then the
> same numbers were deleted from the old directory d1. In test 2,
> rsync does none of that, but only has to check for differences.
> ~40,000 metadata operations avoided on the filesystem in this
>> 2. Metadata history. If there is an existing file in the target
>> dir that differs only by metadata (permissions, ownership,
>> timestamp) then rsync will simply change that metadata. That
>> change affects all instances of that file. Of course this is
>> better for storage space as the alternative is storing another
>> copy of the file with the different metadata but we decided it
>> was better to have that information saved.
> I would love to see someone make a patched version of rsync to
> allow callers to select a different behaviour in this case!
> So, if a file has identical content on source and destination but
> different metadata, then if --link-dest is in use and the link
> count on the destination is > 1, then take a new copy from source
> rather than just updating the metadata (the file could be copied on
> the destination and then the copy updated with the new metadata and
> the old version removed, but this would not be essential - just
> perhaps an efficiency gain.)
> Thanks in anticipation!
> Dr Robert C. Bell HPC National Partnerships | Scientific Computing
> Information Management and Technology CSIRO T +61 3 9669 8102 Alt
> +61 3 8601 3810 Mob +61 428 108 333
> Robert.Bell at csiro.au<mailto:Robert.Bell at csiro.au> | www.csiro.au |
> wiki.csiro.au/display/ASC/ Street: CSIRO ASC Level 11, 700 Collins
> Street, Docklands Vic 3008, Australia Postal: CSIRO ASC Level 11,
> GPO Box 1289, Melbourne Vic 3001, Australia
> PLEASE NOTE The information contained in this email may be
> confidential or privileged. Any unauthorised use or disclosure is
> prohibited. If you have received this email in error, please
> delete it immediately and notify the sender by return email. Thank
> you. To the extent permitted by law, CSIRO does not represent,
> warrant and/or guarantee that the integrity of this communication
> has been maintained or that the communication is free of errors,
> virus, interception or interference.
> Please consider the environment before printing this email.
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Kevin at FutureQuest.net (work)
Orlando, Florida kmk at sanitarium.net (personal)
Web page: http://www.sanitarium.net/
PGP public key available on web site.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
-----END PGP SIGNATURE-----
More information about the rsync