--fuzzy search over to-be-deleted files to catch moved files and directories

Matt McCutchen matt at mattmccutchen.net
Sat Nov 21 07:08:05 MST 2009


On Fri, 2009-11-13 at 18:58 +0100, H. Langos wrote:
> Ahh, ok, so here size+mtime or checksum select the base file. 
> 
> And if that selection fails then "--fuzzy" search is applied but looks 
> only in the /dst directory for a suitable candidate.
> 
> (Or is the temporal order reversed?)

Yes, that's about right.

> > > > [--detect-renamed] doesn't calculate name similarity like --fuzzy because that would
> > > > be prohibitively expensive in the current implementation.
> > > Only files of the same size should be
> > > candidates to start with, right?
> > 
> > No, the name similarity calculation I'm talking about is the fallback to
> > select a similar basis file when no available destination file passes
> > the quick check, so it does not require a size match.
> 
> Hmm, ok so fuzzy also finds files that are slightly different and have their
> name slightly changed.

There's no "slightly" on "different" there.  Assuming --fuzzy doesn't
find a quick-check match (and it probably won't because --detect-renamed
has already searched the whole destination with the same criteria), the
choice of basis files is based exclusively on name similarity.

> This sounds like it would be a good idea to (have the option to) include 
> the delete candidates directory .~tmp~ (or whatever else "--detect-renamed" 
> uses) included in the --fuzzy search.

I'm not clear on what you're proposing here.  Could you provide an
example?

> In fact I do just those things with a script when
> importing pictures from any of my cameras into the photo archive. I 
> rename them as shown above and then I move them to a directory structure
> made of <year>/<month>/<day>/ . I don't change the exif tags yet, which 
> I wanted to add in the future. 
> But that would make the  size+mtime/checksum test fail. Using "--fuzzy" 
> would help, but only if I'd do an rsync between the moving operation 
> and the tag changing operation.
> 
> No matter which operation I'd do first, but doing both together would 
> mean completely new transfer to my backup location. :-/

Right.  Note that if you did an rsync between the moving and the tag
changing, you wouldn't need --fuzzy on the second rsync because the
files would already be in the right places.

Efficiently handling simultaneous renames and data changes is very hard
for a stateless tool like rsync.  If I understand correctly that you're
moving files without changing their basenames, it would work in this
case to extend --detect-renamed to look for an exact basename match if
there is no quick-check match.  That would overlap even more with the
current --fuzzy functionality.  There may be a better way to factor
things.

-- 
Matt



More information about the rsync mailing list