Discussion about the detect-renamed patch

Matt McCutchen matt at mattmccutchen.net
Wed Nov 7 22:12:34 GMT 2007


On Wed, 2007-11-07 at 09:13 -0500, Charles Perreault wrote:
> For what I read about that lax patch, using it is risky.  In fact
> that's not what I want at all, I want the content to be checked using
> block checksums and the delta-transfer algorithm if the detection did
> a mistake, like the detect-renamed patch does.  But I want if possible
> to find a match for every new file the sender has and not the receiver
> in order to decrease the amount of network traffic, and there the
> current patch seems to be weak.

I know you want to be able to detect copies as well as renames.  Are you
saying anything more than that?

> But I also did the same test over network.  Here's a log with -vvv.
> This log shows --delete-after because I thought maybe something was
> wrong with the default behaviour (I tested --delete too) but that
> didn't change anything :
> 
> $ mkdir src/dir1
> $ mv src/file2 src/dir1/
> $ rsync -avvv --delete-after --detect-renamed src nas1:/home/user

> sent 25316 bytes  received 60 bytes  50752.00 bytes/sec
> total size is 35653  speedup is 1.40
> _exit_cleanup(code=0, file=main.c, line=977): about to call exit(0)
> 
> As you can see, the whole file was transferred again.

I cannot reproduce this.  For me, rsync 2.6.9 + its detect-renamed.diff
and the latest development rsync + its detect-renamed.diff both detect
the rename correctly.

> As to memory usage, it depends on what option you choose
> (detect-renamed or detect-copied), my patch could do both.  The former
> only search a match in extraneous files on the receiver

Good point: if turnover is low and only renames are being detected, your
approach (in combination with incremental recursion) may use much less
memory than the current one.

>  and the later, indeed, needs to hold a complete listing of
> destination in memory.  If the current patch already needs a listing
> of the source, well my method would use about the same amount of
> memory on average

Yes, a complete listing of one side.

>  (in the order of O(n)).

Please don't describe the memory usage as O(n).  The constant factor in
front of the number of files is really important as it can be the
difference between a smooth run and death by swapping.

> > > No information about current hierarchy of the files neither on sender
> > > or receiver is needed to build the list and the tree.    
> > 
> > What do you mean?  Steps 1 and 2 scan the entire hierarchy on both
> > sides.
> Yes it is scanned, but not used in the matching process.  Filenames
> and paths are only relevant if more than one match is found (size +
> mod time) to discriminate the best match, otherwise it's useless.

OK, so what's your point?  The filenames and paths have to be held in
memory so that the matching destination files can actually be accessed
when it comes time to use their data for delta transfers.

Matt



More information about the rsync mailing list