--fuzzy search over to-be-deleted files to catch moved files and directories
Matt McCutchen
matt at mattmccutchen.net
Sat Nov 21 07:08:05 MST 2009
On Fri, 2009-11-13 at 18:58 +0100, H. Langos wrote:
> Ahh, ok, so here size+mtime or checksum select the base file.
>
> And if that selection fails then "--fuzzy" search is applied but looks
> only in the /dst directory for a suitable candidate.
>
> (Or is the temporal order reversed?)
Yes, that's about right.
> > > > [--detect-renamed] doesn't calculate name similarity like --fuzzy because that would
> > > > be prohibitively expensive in the current implementation.
> > > Only files of the same size should be
> > > candidates to start with, right?
> >
> > No, the name similarity calculation I'm talking about is the fallback to
> > select a similar basis file when no available destination file passes
> > the quick check, so it does not require a size match.
>
> Hmm, ok so fuzzy also finds files that are slightly different and have their
> name slightly changed.
There's no "slightly" on "different" there. Assuming --fuzzy doesn't
find a quick-check match (and it probably won't because --detect-renamed
has already searched the whole destination with the same criteria), the
choice of basis files is based exclusively on name similarity.
> This sounds like it would be a good idea to (have the option to) include
> the delete candidates directory .~tmp~ (or whatever else "--detect-renamed"
> uses) included in the --fuzzy search.
I'm not clear on what you're proposing here. Could you provide an
example?
> In fact I do just those things with a script when
> importing pictures from any of my cameras into the photo archive. I
> rename them as shown above and then I move them to a directory structure
> made of <year>/<month>/<day>/ . I don't change the exif tags yet, which
> I wanted to add in the future.
> But that would make the size+mtime/checksum test fail. Using "--fuzzy"
> would help, but only if I'd do an rsync between the moving operation
> and the tag changing operation.
>
> No matter which operation I'd do first, but doing both together would
> mean completely new transfer to my backup location. :-/
Right. Note that if you did an rsync between the moving and the tag
changing, you wouldn't need --fuzzy on the second rsync because the
files would already be in the right places.
Efficiently handling simultaneous renames and data changes is very hard
for a stateless tool like rsync. If I understand correctly that you're
moving files without changing their basenames, it would work in this
case to extend --detect-renamed to look for an exact basename match if
there is no quick-check match. That would overlap even more with the
current --fuzzy functionality. There may be a better way to factor
things.
--
Matt
More information about the rsync
mailing list