rsync --link-dest won't link even if existing file is out of date

Kevin Korb kmk at sanitarium.net
Mon Apr 6 10:31:07 MDT 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

ZFS does have big RAM requirements.  8GB of RAM is pretty much the
minimum.  As for CPU besides being new enough to be on a motherboard
with 8GB of RAM you should be fine.

On 04/06/2015 12:25 PM, Clint Olsen wrote:
> Not to mention the fact that ZFS requires considerable hardware 
> resources (CPU & memory) to perform well. It also requires you to
> learn a whole new terminology to wrap your head around it.
> 
> It's certainly not a trivial swap to say the least...
> 
> Thanks,
> 
> -Clint
> 
> On Mon, Apr 6, 2015 at 9:12 AM, Ken Chase
> <rsync-list-m829 at sizone.org <mailto:rsync-list-m829 at sizone.org>>
> wrote:
> 
> This has been a consideration. But it pains me that a tiny 
> change/addition to the rsync option set would save much time and
> space for other legit use cases.
> 
> We know rsync very well, we dont know ZFS very well (licensing kept
> the tech out of our linux-centric operations). We've been using it
> but we're not experts yet.
> 
> Thanks for the suggestion.
> 
> /kc
> 
> On Mon, Apr 06, 2015 at 12:07:05PM -0400, Kevin Korb said: Since
> you are in an environment with millions of files I highly recommend
> that you move to ZFS storage and use ZFS's subvolume snapshots
> instead of --link-dest.  It is much more space efficient, rsync run
> time efficient, and the old backups can be deleted in seconds.
> Rsync doesn't have to understand anything about ZFS.  You just
> rsync to the same directory every time and have ZFS do a
>> snapshot
> on that directory between runs.
> 
> On 04/06/2015 01:51 AM, Ken Chase wrote:
>> Feature request: allow --link-dest dir to be linked to even if
>> file exists in target.
> 
>> This statement from the man page is adhered to too strongly
>> IMHO:
> 
>> "This option works best when copying into an empty destination 
>> hierarchy, as rsync treats existing files as definitive (so it 
>> never looks in the link-dest dirs when a destination file
>> already exists)".
> 
>> I was suprised by this behaviour as generally the scheme is to
>> be efficient/save space with rsync.
> 
>> When the file is out of date but exists in the --l-d target, it 
>> would be great if it could be removed and linked. If an option
>> was supplied to request this behaviour, I'd actually throw some
>> money at making it happen.  (And a further option to retain a
>> copy if inode permissions/ownership would otherwise be changed.)
> 
>> Reasoning:
> 
>> I backup many servers with --link-dest that have filesystems of 
>> 10+M files on them.  I do not delete old backups - which take
>> 60min per tree or more just so rsync can recreate them all in an
>> empty target dir when <1% of files change per day (takes 3-5 hrs
>> per backup!).
> 
>> Instead, I cycle them in with mv $olddate $today then rsync
>> --del --link-dest over them - takes 30-60 min depending. (Yes,
>> some malleability of permissions risk there, mostly interested
>> in contents tho).  Problem is, if a file exists AT ALL, even out
>> of date, a new copy is put overtop of it per the above man page 
>> decree.
> 
>> Thus much more disk space is used. Running this scheme with
>> moving old backups to be written overtop of accumulates many
>> copies of the exact same file over time.  Running pax -rpl over
>> the copies before rsyncing to them works (and saves much space!),
>> but takes a very long time as it traverses and compares 2 large
>> backup trees thrashing the same device (in the order of 3-5x the
>> rsync's time, 3-5 hrs for pax - hardlink(1) is far worse, I
>> suspect a some non-linear algorithm therein - it ran 3-5x slower
>> than pax again).
> 
>> I have detailed an example of this scenario at
> 
> 
>> http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists
>
>>  which also indicates --delete-before and --whole-file do not
>> help at all.
> 
>> /kc
> 
> 
>> -- Please use reply-all for most replies to avoid omitting the
> mailing list.
>> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
>> Before posting, read:
> http://www.catb.org/~esr/faqs/smart-questions.html
> 
> -- Ken Chase - ken att heavycomputing.ca
> <http://heavycomputing.ca> Toronto Canada Heavy Computing - Clued
> bandwidth, colocation and managed linux VPS @151 Front St. W. -- 
> Please use reply-all for most replies to avoid omitting the
> mailing list. To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync Before posting,
> read: http://www.catb.org/~esr/faqs/smart-questions.html
> 
> 
> 
> 

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
	Kevin Korb			Phone:    (407) 252-6853
	Systems Administrator		Internet:
	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
	Orlando, Florida		kmk at sanitarium.net (personal)
	Web page:			http://www.sanitarium.net/
	PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlUitMsACgkQVKC1jlbQAQcNBgCeLznsYokPy4A3BGmsRmabFmag
C4IAoKWUVb+azUEXtMFdUQHKUTU4kV3+
=cuLG
-----END PGP SIGNATURE-----


More information about the rsync mailing list