"intelligent" rsync scripts?

Eberhard Moenkeberg emoenke at gwdg.de
Wed Oct 26 18:12:30 GMT 2005


Hi,

On Wed, 26 Oct 2005, Chris Shoemaker wrote:
> On Wed, Oct 26, 2005 at 03:02:51PM +0200, Tomasz Chmielewski wrote:

>> I use rsync for backing up user data, profiles, important network shares
>> etc. (from several locations over WAN).
>>
>> Overall it works flawlessly, as it transfers only changes, but sometimes
>> there are some serious hiccups.
>>
>> Suppose this scenario, suppose it's 1 GB of files:
>>
>> user shares:
>>
>> /home/joe/data/file1
>>               /file2
>>               /...
>>               /file1000
>>
>> Now the user _moves_ that data to some other folder:
>>
>> /home/joe/WAN_goes_crazy/file1
>>                           /file2
>>                           /...
>>                           /file1000
>>
>> ...and we start a backup process.
>>
>> rsync will first transfer data from "/home/joe/WAN_goes_crazy/file...",
>> and then deletes "/home/joe/data/data...".
>>
>> Basically, this is how rsync works, but in the end, we transfer 1 GB of
>> files over WAN that we already have locally - the only thing that
>> changed was the folder where that data is.
>>
>> Is there some workaround for this (some intelligent script etc.)?
>
> ISTM it would be quite useful to make rsync "rename-aware".  Caveat: I
> haven't hacked on rsync for quite a while, so my understand may be
> wrong or outdated.  But, I think this could be implemented thusly:
>
> You'd want to make this optional, say --detect-renames, because it
> does incur an extra processing cost.  That option should imply at
> least, --checksum and --delete-after if --delete at all.  Then you
> just need the generator to be slightly more clever.  For each file on
> the sender which is *missing* from the receiver, it needs to search
> the checksums of all of receiver's existing files for a checksum
> match.  If it finds a match, it can simply use that matched file and
> either copy or move it to the new filename.  Then that file just gets
> skipped.
>
> I don't think this would require any changes to sender, receiver or
> protocol.  What I described would only handle
> rename-without-modification, but it's cost is not very high.  I think
> it's O(N*M), N=# of files on sender that are missing on receiver, M=#
> of files on sender.  That's the cost over and above whatever
> --checksum costs.
>
> I don't see how rename-with-modification could be handled efficiently,
> though.  Better not to go there.
>
> If nobody says I'm way off base here, I might be inspired to try to
> implement this.  Unless someone else has the time and inclination...

The first pass of "rename-without-modification" could even be much easier:
size and timestamp should match.

Cheers -e
-- 
Eberhard Moenkeberg (emoenke at gwdg.de, em at kki.org)


More information about the rsync mailing list