Use rsync's checksums to deduplicate across backups

Cameron Simpson cs at zip.com.au
Sun Nov 6 13:47:37 MST 2011


On 04Nov2011 10:27, Chris Dunlop <chris at onthe.net.au> wrote:
| On Thu, Nov 03, 2011 at 09:34:53AM -0500, Alex Waite wrote:
| >> Not a direct answer, but this may do what you want:
| >>
| >>  http://gitweb.samba.org/?p=rsync-patches.git;a=blob;f=link-by-hash.diff
| >>
| >>  This patch adds the --link-by-hash=DIR option, which hard
| >> links received
| >>  files in a link farm arranged by MD4 file hash.  The result
| >> is that the system
| >>  will only store one copy of the unique contents of each
| >> file, regardless of
| >>  the file's name.
| >
| > This does look like what I was describing, though it seems it
| > was
| > never included into rsync.  Is that correct?
| 
| Yes, rsync-patches is stuff that is deemed to be not yet ready
| (i.e. it may go in after it's been polished), or not at all
| suitable (e.g. it's too esoteric for general usage), for rsync
| proper.

Regarding "esoteric": I also have this kind of backup scheme. I would welcome
that functionality, probably.

BTW, how far does the --link-dest option go in this direction? I use it
a fair bit (backing up multiple hosts with the same dataset on them,
using link dest to refer to the parallel snapshots).

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Being on a Beemer and not having a wave returned by a Sportster is like
having a clipper ship's hailing not returned by an orphaned New Jersey
solid waste barge.   - OTL


More information about the rsync mailing list