"intelligent" rsync scripts?
Eberhard Moenkeberg
emoenke at gwdg.de
Wed Oct 26 18:12:30 GMT 2005
Hi,
On Wed, 26 Oct 2005, Chris Shoemaker wrote:
> On Wed, Oct 26, 2005 at 03:02:51PM +0200, Tomasz Chmielewski wrote:
>> I use rsync for backing up user data, profiles, important network shares
>> etc. (from several locations over WAN).
>>
>> Overall it works flawlessly, as it transfers only changes, but sometimes
>> there are some serious hiccups.
>>
>> Suppose this scenario, suppose it's 1 GB of files:
>>
>> user shares:
>>
>> /home/joe/data/file1
>> /file2
>> /...
>> /file1000
>>
>> Now the user _moves_ that data to some other folder:
>>
>> /home/joe/WAN_goes_crazy/file1
>> /file2
>> /...
>> /file1000
>>
>> ...and we start a backup process.
>>
>> rsync will first transfer data from "/home/joe/WAN_goes_crazy/file...",
>> and then deletes "/home/joe/data/data...".
>>
>> Basically, this is how rsync works, but in the end, we transfer 1 GB of
>> files over WAN that we already have locally - the only thing that
>> changed was the folder where that data is.
>>
>> Is there some workaround for this (some intelligent script etc.)?
>
> ISTM it would be quite useful to make rsync "rename-aware". Caveat: I
> haven't hacked on rsync for quite a while, so my understand may be
> wrong or outdated. But, I think this could be implemented thusly:
>
> You'd want to make this optional, say --detect-renames, because it
> does incur an extra processing cost. That option should imply at
> least, --checksum and --delete-after if --delete at all. Then you
> just need the generator to be slightly more clever. For each file on
> the sender which is *missing* from the receiver, it needs to search
> the checksums of all of receiver's existing files for a checksum
> match. If it finds a match, it can simply use that matched file and
> either copy or move it to the new filename. Then that file just gets
> skipped.
>
> I don't think this would require any changes to sender, receiver or
> protocol. What I described would only handle
> rename-without-modification, but it's cost is not very high. I think
> it's O(N*M), N=# of files on sender that are missing on receiver, M=#
> of files on sender. That's the cost over and above whatever
> --checksum costs.
>
> I don't see how rename-with-modification could be handled efficiently,
> though. Better not to go there.
>
> If nobody says I'm way off base here, I might be inspired to try to
> implement this. Unless someone else has the time and inclination...
The first pass of "rename-without-modification" could even be much easier:
size and timestamp should match.
Cheers -e
--
Eberhard Moenkeberg (emoenke at gwdg.de, em at kki.org)
More information about the rsync
mailing list