Discussion about the detect-renamed patch

Matt McCutchen matt at mattmccutchen.net
Wed Nov 7 04:09:30 GMT 2007


On Tue, 2007-11-06 at 22:10 -0500, Charles Perreault wrote:
> now input the following to test moving into a new folder :
> 
> $ mkdir src/dir1
> $ mv src/file2 src/dir1/
> $ rsync --detect-renamed -avz --delete src/ dest/
> building file list ... done
> deleting file2
> ./
> dir1/
> dir1/file2
> 
> sent 24031 bytes  received 54 bytes  48170.00 bytes/sec
> total size is 35653  speedup is 1.48

If you pass three -v options, you'll see that rsync does in fact detect
the rename.  However, it does not accept the destination file as
identical to the source file in lieu of performing a transfer because
that would be risky (use the --detect-renamed-lax option provided by
detect-renamed-lax.diff if you really want this behavior).  Instead,
rsync just uses the destination file as a basis for the source file.  On
a remote copy, the delta-transfer algorithm uses the basis to decrease
the amount of network traffic, but on a local copy such as yours, the
delta-transfer algorithm is off by default (since decreasing
interprocess traffic is not an objective) so detection of a rename has
no effect on the number of bytes sent.

> 4- on receiver, find a match in the tree if possible for each item of
> the new files list, and use the match as a base for the
> synchronization (a fuzzy / bayesian approach might be used later to
> find an approximal good match, but that ain't my goal right now)

The current patch does the matching the other way around: for each
extraneous destination file D, it looks for a matching source file S and
uses D as a basis for S if S is new.  Your approach is more
straightforward, can detect copies, offers more flexibility in choosing
the best basis for each new source file, and would make it natural to
combine --detect-renamed and --fuzzy; it requires holding a complete
listing of the destination in memory, while the current one holds a
listing of the source.  I would be interested to see an implementation
of your approach.

> 5- on receiver, locally copy the matched files where they should be if
> they existed
> 6- compute the checksums for modified files and for all the matched
> files copied at step 5 (mark them dirty/not up to date)

"Copying" the matched destination files and "mark[ing] them dirty"
corresponds to rsync's current behavior of hard-linking them into the
partial dirs for the matching source files.  It's not clear to me why
you are computing checksums, or are you just referring to the block
checksums computed by the generator for the delta-transfer algorithm?

> No information about current hierarchy of the files neither on sender
> or receiver is needed to build the list and the tree.

What do you mean?  Steps 1 and 2 scan the entire hierarchy on both
sides.

Matt



More information about the rsync mailing list