New feature: detect and avoid transfering renamed files

Wayne Davison wayned at samba.org
Tue Sep 9 14:49:06 GMT 2008


Sorry for the slow reply -- I marked your message for more in-depth
study, and failed to get back to it until now.

On Sun, Jun 22, 2008 at 08:01:16PM -0400, Phil Vandry wrote:
> I have long wished for a feature in rsync to detect files that have
> renamed on the sender side since the last time a sync was performed,
> and avoid transfering those files to the destination the same way it
> avoids transfering files that haven't changed.

The detect-renamed patch in the patches directory has one possible
implementation of this, but it fails to handle things like the /var/log
rotation where files get renamed over the top of other files in the
transfer.

Your solution is quite an interesting one, but it does have some minor
drawbacks:

 - It creates a single (potentially really big) directory of files on
   the receiver for the byinode/* files.
 - The file list increases in size significantly (around double).
 - The transfer must remain identical to prior transfers, or the
   synthesized directory will not match (and could be truncated with
   --delete).
 - It disables incremental recursion (as does the detect-renamed patch,
   but it would be nice to avoid this).
 - While it avoids doing an extra scan of the destination files (unlike
   the detect-renamed patch), the processing of all the files in the
   synthesized directory is akin to an extra scan pass.

However, as long as those trade-offs are acceptable, it does do a great
job of finding renamed files.  I'd like something a little more flexible
for a future rsync, though.

I had been thinking of extending the db patch to add the ability to
track files by checksum in a database.  This would allow a run that used
the DB to be an efficient checksum run (reading the checksums from the
DB, not slowly generating them) and look up matching checksums in the DB
on the receiving side to facilitate either renaming and/or efficient
copying.  Using a simple DB for the data (such as SQLite) would be easy
to support, would work regardless of how much of a hierarchy was being
copied, and would not require an extra hierarchy scan for each transfer
(though it would require that the DB info be double-checked and ignored
if not accurate, and it would be most efficient if the receiving side
was not prone to being reorganized without updating the DB).  To
facilitate the typical log-dir rotation idiom, I was thinking of doing
a directory-at-a-time of delayed-update (unless the user asked for a
whole-transfer delayed-update).

That idea is my current favorite for adding rename support.  What do
folks think?

..wayne..


More information about the rsync mailing list