"intelligent" rsync scripts?

Chris Shoemaker c.shoemaker at cox.net
Wed Oct 26 18:04:34 GMT 2005


On Wed, Oct 26, 2005 at 03:02:51PM +0200, Tomasz Chmielewski wrote:
> I use rsync for backing up user data, profiles, important network shares 
> etc. (from several locations over WAN).
> 
> Overall it works flawlessly, as it transfers only changes, but sometimes 
> there are some serious hiccups.
> 
> Suppose this scenario, suppose it's 1 GB of files:
> 
> user shares:
> 
> /home/joe/data/file1
>               /file2
>               /...
>               /file1000
> 
> Now the user _moves_ that data to some other folder:
> 
> /home/joe/WAN_goes_crazy/file1
>                           /file2
>                           /...
>                           /file1000
> 
> ...and we start a backup process.
> 
> rsync will first transfer data from "/home/joe/WAN_goes_crazy/file...", 
> and then deletes "/home/joe/data/data...".
> 
> Basically, this is how rsync works, but in the end, we transfer 1 GB of 
> files over WAN that we already have locally - the only thing that 
> changed was the folder where that data is.
> 
> Is there some workaround for this (some intelligent script etc.)?

ISTM it would be quite useful to make rsync "rename-aware".  Caveat: I
haven't hacked on rsync for quite a while, so my understand may be
wrong or outdated.  But, I think this could be implemented thusly:

You'd want to make this optional, say --detect-renames, because it
does incur an extra processing cost.  That option should imply at
least, --checksum and --delete-after if --delete at all.  Then you
just need the generator to be slightly more clever.  For each file on
the sender which is *missing* from the receiver, it needs to search
the checksums of all of receiver's existing files for a checksum
match.  If it finds a match, it can simply use that matched file and
either copy or move it to the new filename.  Then that file just gets
skipped.

I don't think this would require any changes to sender, receiver or
protocol.  What I described would only handle
rename-without-modification, but it's cost is not very high.  I think
it's O(N*M), N=# of files on sender that are missing on receiver, M=#
of files on sender.  That's the cost over and above whatever
--checksum costs.  

I don't see how rename-with-modification could be handled efficiently,
though.  Better not to go there.

If nobody says I'm way off base here, I might be inspired to try to
implement this.  Unless someone else has the time and inclination...

-chris


More information about the rsync mailing list