cut-off time for rsync ?

Ken Chase rsync-list-m829 at sizone.org
Tue Jun 30 09:31:23 MDT 2015


If your goal is to reduce storage, and scanning inodes doesnt matter,
use --link-dest for targets. However, that'll keep a backup for every
time that you run it, by link-desting yesterday's copy.

Y end up with a backup tree dir per day, with files hardlinked against
all other backup dirs. My (and many others) here's solution is to

mv $ancientbackup $today; rsync --del --link-dest=$yest source:$dirs $today 

creating gaps in the ancient sequence of days of backups - so I end up
keeping (very roughly) 1,2,3,4,7,10,15,21,30,45,60,90,120,180 days old backups
(of course this isnt how it works, there's some binary counting going on in there,
so the elimination isnt exactly like that - every day each of those gets a day older.
There are some tower of hanoi-like solutions to this for automated backups.)

This means something twice as old has twice as few backups for the same time range,
meaning I keep the same frequency*age value for each backup timerange into the past.

The result is a set of dirs dated (in my case) 20150630 for eg, which looks
exactly like the actual source tree i backed up, but only taking up space of
changed files since yesterday. (caveat: it's hardlinked against all the other
backups, thus using no more space on disk HOWEVER, some server stuff like
postfix doenst like hardlinked files in its spool due to security concerns -
so if you should boot/use the backup itself without making a plain copy (which
is recommended) 1) postfix et al will yell 2) you will be modifying the whole
set of dirs that point to the inode you just booted/used).

My solution avoids scanning the source twice (which in my case of backing up
5x 10M files off servers daily is a huge cost), important because the scantime
takes longer than the backup/xfer time (gigE network for a mere 20,000 changed
files per 10M seems average per box of 5). Also it's production gear - as
little time as possible thrashing the box (and its poor metadata cache) is
important for performance. Getting the backups done during the night lull is
therefore required. I dont have time to delete (nor the disk RMA cycle
patience) 10M files on the receiving side just to spend 5 hours recreating
them; 20,000 seems better to me.

You could also use --backup and --backup-dir, but I dont do it that way.

/kc


On Tue, Jun 30, 2015 at 10:32:31AM +0200, Dirk van Deun said:
  >Hi,
  >
  >I used to rsync a /home with thousands of home directories every
  >night, although only a hundred or so would be used on a typical day,
  >and many of them have not been used for ages.  This became too large a
  >burden on the poor old destination server, so I switched to a script
  >that uses "find -ctime -7" on the source to select recently used homes
  >first, and then rsyncs only those.  (A week being a more than good
  >enough safety margin in case something goes wrong occasionally.)
  >
  >Is there a smarter way to do this, using rsync only ?  I would like to
  >use rsync with a cut-off time, saying "if a file is older than this,
  >don't even bother checking it on the destination server (and the same
  >for directories -- but without ending a recursive traversal)".  Now
  >I am traversing some directories twice on the source server to lighten
  >the burden on the destination server (first find, then rsync).
  >
  >Best,
  >
  >Dirk van Deun
  >-- 
  >Ceterum censeo Redmond delendum
  >-- 
  >Please use reply-all for most replies to avoid omitting the mailing list.
  >To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
  >Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

-- 
Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.


More information about the rsync mailing list