New feature: detect and avoid transfering renamed files

Phil Vandry vandry at TZoNE.ORG
Mon Jun 23 00:01:16 GMT 2008


Hello rsyncers,

I have long wished for a feature in rsync to detect files that have
renamed on the sender side since the last time a sync was performed,
and avoid transfering those files to the destination the same way it
avoids transfering files that haven't changed.

Example 1: a log directory (like /var/log) is backed up every day. Most
of the time, rsync transfers very little data, but once a week it
basically performs a full copy without finding any basis files on the
destination copy. This is because the logs have been rotated and what
was /var/log/example.1.gz has been renamed to /var/log/example.2.gz,
/var/log/example.2.gz has become /var/log/example.3.gz, and so on.
rsync has no way to know this.

Example 2: a home directory is similarly backed up every day. One day,
the user decides to clean house and move lots of files around, creating
new directories, and moving hundreds of files around among different
directories. Once again, rsync is going to have to retransfer each of
those files.

My solution is to add a stable, unchanging, name for each file in the
transfer. As long as the --hard-links (-H) option is used, this stable
name will provide a name that already exists on the receiver side, and
the receiver can create a hard link to this name even when the file
appears under a completely new name. These stable names do not actually
exist on the sender side, they are synthesized by rsync from the
unchanging attributes that confer the file its identity: its device and
inode number.

I have attached a patch for a proof of concept of this feature. With it,
I can start with this directory structure on the sender side:

testfiles/
testfiles/one
testfiles/two

...and rsync it to the destination. On the destination there is an extra
directory called "byinode" which contains hard links to both regular files.
Then I rename testfiles/one to testfiles/oneone . When I rsync again,
instead of deleting the file called "one" and transfering the full contents
of an apparently new file called "oneone", the file "one" is deleted and
"oneone" is created by hardlinking to the stable filename.

I would like to know the following:

- Are people interested in a feature like this?

- Is there a better way to do it?

- It only works with protocol version 30 at the moment. Would there be
any interest in making it work with older protocol versions? (it has to
be done very differently in older versions)

- Could this patch be committed once I take it beyond proof of concept
state?

Notes, if you want to test it:

- It is hardcoded to always enable the feature synthesize a directory
called "byinode" and add it to the file list. The final version will
make this a command line option, of course.
- It only works with protocol version 30.
- Use with at least --delete --no-i-r -r
- Only the sender side requires the patch
- The patch is against rsync-3.0.3pre2

Thank you for your feedback

-Phil
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rsync-3.0.3pre2.byinode.patch
Type: text/x-diff
Size: 6138 bytes
Desc: not available
Url : http://lists.samba.org/archive/rsync/attachments/20080622/50552978/rsync-3.0.3pre2.byinode.bin


More information about the rsync mailing list