Backup scripts - recycling old backup directories (Kevin Korb)

Kevin Korb kmk at sanitarium.net
Mon Sep 15 09:03:34 MDT 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I would never operate in a manner that only has 5-6 days of old
backups.  The backups that I am deleting are more than a year old.

On 09/15/2014 02:24 AM, Robert Bell wrote:
> Kevin,
> 
> Thanks for the reply and interest in this topic.
> 
> Comments below.
> 
> Regards
> 
> Rob.
> 
>> I did consider that but rejected it for 2 reasons...
>> 
>> 1. Backup run time.  We have a 4 hour window to run backups at
>> night. Using recycled directories significantly extended the
>> backup run time.  The deletion time is eliminated but frankly, we
>> have the other 20 hours of the day to do deletions.  We had to
>> give up using - --link-dest when the deletions started to
>> actually take that long even though the backups still ran in
>> under 4 hours.
> 
> For us, the recycling of old directories significantly shortened
> the time to do backups, since the recycled backups have typically
> 95% of the files/directories correct (with daily backups and Tower
> of Hanoi, half of our recycled backups are only 5 to 6 days old).
> 
> I've just done some tests with a fairly pathological case, all on
> one host.
> 
> I set up a source tree 's' with 11111 sub-directories and 10000
> files, and then two destinations: cp -a s d1 cp -afl d1 d2
> 
> I then did the first test: # rsync to a new directory, followed by
> a remove of an old directory." time rsync -a --link-dest=../d2 s/
> d3 time /bin/rm -rf d1
> 
> 
> I then scrubbed the lot, set it up again, and did the second test: 
> mv d1 d3 # rsync to a recycled directory" time rsync -a
> --link-dest=../d2 --delete s/ d3
> 
> I hope I got this right!  I've made no effort to circumvent
> caching.
> 
> Anyway, here is a table of the average times (seconds) over 5 runs
> of each test.
> 
> Real    User    Sys    (User+Sys) test 1    2.454s    0.150s
> 2.196s    2.346s test 2    0.392s    0.100s    0.572s    0.672s 
> ratio      6.3      1.5      3.8      3.5
> 
> (The User+Sys time is pretty much invariant, even though in earlier
> tests the real time suffered major blowouts owing to contention.)
> 
> So, the big difference is that in test 1, the 11111 sub-directories
> and 10000 files were created in the destination d3, and then the
> same numbers were deleted from the old directory d1.  In test 2,
> rsync does none of that, but only has to check for differences.
> ~40,000 metadata operations avoided on the filesystem in this
> case.
> 
> 
>> 
>> 2. Metadata history.  If there is an existing file in the target
>> dir that differs only by metadata (permissions, ownership,
>> timestamp) then rsync will simply change that metadata.  That
>> change affects all instances of that file.  Of course this is
>> better for storage space as the alternative is storing another
>> copy of the file with the different metadata but we decided it
>> was better to have that information saved.
> Yes.
> 
> 
> I would love to see someone make a patched version of rsync to
> allow callers to select a different behaviour in this case!
> 
> So, if a file has identical content on source and destination but 
> different metadata, then if --link-dest is in use and the link
> count on the destination is > 1, then take a new copy from source
> rather than just updating the metadata (the file could be copied on
> the destination and then the copy updated with the new metadata and
> the old version removed, but this would not be essential - just
> perhaps an efficiency gain.)
> 
> Thanks in anticipation!
> 
> 
> Dr Robert C. Bell HPC National Partnerships | Scientific Computing 
> Information Management and Technology CSIRO T +61 3 9669 8102 Alt
> +61 3 8601 3810 Mob +61 428 108 333 
> Robert.Bell at csiro.au<mailto:Robert.Bell at csiro.au> | www.csiro.au | 
> wiki.csiro.au/display/ASC/ Street: CSIRO ASC Level 11, 700 Collins
> Street, Docklands Vic 3008, Australia Postal: CSIRO ASC Level 11,
> GPO Box 1289, Melbourne Vic 3001, Australia
> 
> PLEASE NOTE The information contained in this email may be
> confidential or privileged. Any unauthorised use or disclosure is
> prohibited.  If you have received this email in error, please
> delete it immediately and notify the sender by return email. Thank
> you.  To the extent permitted by law, CSIRO does not represent,
> warrant and/or guarantee that the integrity of this communication
> has been maintained or that the communication is free of errors,
> virus, interception or interference.
> 
> Please consider the environment before printing this email.

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
	Kevin Korb			Phone:    (407) 252-6853
	Systems Administrator		Internet:
	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
	Orlando, Florida		kmk at sanitarium.net (personal)
	Web page:			http://www.sanitarium.net/
	PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlQW/8YACgkQVKC1jlbQAQfCIwCdGKm9z00G0Xu4tItwuUlUaLum
8dwAn0sY8qriEJeUsReRlU67GkbA5BRZ
=2b6r
-----END PGP SIGNATURE-----


More information about the rsync mailing list