rsync --link-dest won't link even if existing file is out of date

Wed Apr 15 02:10:47 MDT 2015

80 million calls isnt 'that bad' since it completes in 5 hours, yes? I suppose
I dont mind. I should throw more ram in the box and figure out how to tune
meta-data caching so its preferred over file data. Then it'd be quicker.

Either way, it's working for me now, and in fact, if the backup server is
'slow', then thrashing the production servers isnt as bad. I want them
thrashed just the right amount so things complete in a total of 5 hours,
about 1 hr per server. (--bwlimit doenst do it, because much thrashing
of 15M+ files/server is enough meta-data cache disk IO).

More servers? more U and more power in my $expensive facility? Things are
working for me now because I am not recreating 80 million links, which seems
to cost much much more than 80 million stats (reads vs reads+writes+head-thrash-
between-them?)

My secret is doing the servers sequentially, so I dont have more head-thrash
on the backup server. Then things really start slowing down.

Though Im interested in ZFS dedupe, this is the wrong list for that :)

Im more curious about how my system is actually NOT working properly for me,
other than using up more disk than i wanted - except that's fixed now in 3.1
apparently (hasnt made it into debian stable yet, among other distros...).

Im just pax -rwl'ing my old backups manually til they're all using the same inodes,
then I can continue with mv $olddate $otday; rsync --link-dest=$yest host:/dir $today
and things will be great. I've already saved 1TB of 12 over the last 2 weeks by paxing
(and in the last week with the 3.1 rsync), and I expect that to drop by another 2-3TB
over the next month.

/kc

On Wed, Apr 15, 2015 at 02:45:17AM -0400, Kevin Korb said:
  >-----BEGIN PGP SIGNED MESSAGE-----
  >Hash: SHA1
  >
  >On 04/14/2015 11:35 PM, Henri Shustak wrote:
  >>> Ill take a look but I imagine I cant backup the 80 Million files
  >>> I need to in under the 5 hours i have for nightly
  >>> maintenance/backups. Currently it's possible by recycling
  >>> directories...
  >
  >I would expect that recycling directories actually makes this worse.
  >With an empty target directory you don't even need the overhead of
  >- --delete (not as bad as it used to be thanks to --delete-during but it
  >is still overhead).  If your backup window is only 5 hours then that
  >leaves you with 19 hours a day to do other things on your backup
  >server(s) such as deleting off old backups.  Get all those unlink()
  >calls out of your backup window.  Bad enough you need to do 80 million
  >calls to stat().
  >
  >> 
  >> To cover that many files in that much time you will require a high
  >> speed system. Just another thought. Perhaps splitting the backup
  >> onto multiple backup servers / storage systems would reduce the
  >> backup time so that it fits into your window?
  >
  >Agreed completely here.  It is much easier to make more backup servers
  >than it is to make one big one that can handle the entire load.  We
  >divide our backup load by server.  IOW, each backup server has a list
  >of production servers that it backs up.
  >
  >> Also, I strongly agree with the previous posts relating to file
  >> system snapshots. ZFS is just one file system which supports this
  >> kind of system.
  >
  >I have also attempted to use btrfs in Linux for this.  I even wrote up
  >a presentation for my local LUG about it:
  >https://sanitarium.net/golug/rsync+btrfs_backups_2011.html
  >Unfortunately there was nothing but grief.  The btrfs just wasn't
  >stable enough and the btrfs-cleaner kernel thread drove performance
  >into the ground.   We eventually had to abandon it in favor of ZFS on
  >TrueOS.
  >
  >As far as "fast box" goes we decided on 8GB of RAM for most of the
  >backup servers and essentially whatever CPU can handle that much RAM.
  > Most of them are older AMD Athlon 64 X2 desktops.  We do have one
  >with a quad core CPU and 16GB of RAM.  That is the only one running
  >ZFS de-duplication as that is the big RAM hog.
  >
  >> ---------------------------------------------------------------------
  >>
  >> 
  >This email is protected by LBackup, an open source backup solution.
  >> http://www.lbackup.org
  >> 
  >
  >- -- 
  >~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
  >	Kevin Korb			Phone:    (407) 252-6853
  >	Systems Administrator		Internet:
  >	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
  >	Orlando, Florida		kmk at sanitarium.net (personal)
  >	Web page:			http://www.sanitarium.net/
  >	PGP public key available on web site.
  >~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
  >-----BEGIN PGP SIGNATURE-----
  >Version: GnuPG v2
  >
  >iEYEARECAAYFAlUuCP0ACgkQVKC1jlbQAQfvtQCgyUNEGbwYaX3RILUnBvHCn1KH
  >x4MAoIqmRBNpMDkZfiqndZ6oll+GfhLH
  >=8saN
  >-----END PGP SIGNATURE-----
  >-- 
  >Please use reply-all for most replies to avoid omitting the mailing list.
  >To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
  >Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

-- 
Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.