Is there any way to restore/create hardlinks lost in incremental backups?

Chris Green cl at isbd.net
Fri Dec 11 15:19:26 UTC 2020


Guillaume Outters via rsync <rsync at lists.samba.org> wrote:
> On 2020-12-11 12:53, Chris Green wrote :
> 
> > […] wrote a trivial[ish] script that copied
> > all the backups to a new destination sequentially (using --link-dest)
> > and then removed the original tree, having checked the new backups
> > were OK of course.
> 
> With the same cause as yours, I once worked out exactly the same 
> solution.
> 
> But then, having to automate it, I worked a bit more on it, and ended 
> up having a shell script that:
> - recursively listed files as "file size - inode - path"
> - with sort and awk, output the list of "every size that has different 
> inodes"
> - for each output size, cksumed one file for each inode
> - if two different inodes (with the same file size) had their cksum 
> match, then it replaced every file for the last inode, with a link to 
> the first inode
> 
> If you have to run it frequently, you may want to implement something 
> similar.
> Although it ignores mtime info (and thus strips it when lning),
> it has the great benefit of finding every duplicate, be it renamed and 
> move to another dir
> (as in 
> ./her.2020-12-01/Library/Mail/…/Sent.mbox/…/Attachments/…/PhotoDeFamille.JPG 
> versus ./his.2020-11-26/perso/photos/100_9999.JPG).
> 
> (and by the way I reimplemented it in C, "just for fun" and for speed 
> too: https://github.com/outtersg/dude/ . Hmm, in C but in French)
> 
The program jdupes will do it for you as well.  

The disadvantage (for me) of jdupes is that, given 40 or so incremental
backups (which is what I had when I saw the problem) each with many
tens of thousands of files in them it will take a *very* long time to
do its job.

Like your solution it's general, files can have different names and be
in totally different places in the directory hierarchy and it will
find the duplicates.

In my case the files which should be duplicates (and thus be hard
linked) are always ones with the same name in the same place in the
hierarchy.  It feels as if there should be a better/faster way of
addressing this particular case but I don't know what it is.

-- 
Chris Green
·




More information about the rsync mailing list