"intelligent" rsync scripts?
emoenke at gwdg.de
Wed Oct 26 18:12:30 GMT 2005
On Wed, 26 Oct 2005, Chris Shoemaker wrote:
> On Wed, Oct 26, 2005 at 03:02:51PM +0200, Tomasz Chmielewski wrote:
>> I use rsync for backing up user data, profiles, important network shares
>> etc. (from several locations over WAN).
>> Overall it works flawlessly, as it transfers only changes, but sometimes
>> there are some serious hiccups.
>> Suppose this scenario, suppose it's 1 GB of files:
>> user shares:
>> Now the user _moves_ that data to some other folder:
>> ...and we start a backup process.
>> rsync will first transfer data from "/home/joe/WAN_goes_crazy/file...",
>> and then deletes "/home/joe/data/data...".
>> Basically, this is how rsync works, but in the end, we transfer 1 GB of
>> files over WAN that we already have locally - the only thing that
>> changed was the folder where that data is.
>> Is there some workaround for this (some intelligent script etc.)?
> ISTM it would be quite useful to make rsync "rename-aware". Caveat: I
> haven't hacked on rsync for quite a while, so my understand may be
> wrong or outdated. But, I think this could be implemented thusly:
> You'd want to make this optional, say --detect-renames, because it
> does incur an extra processing cost. That option should imply at
> least, --checksum and --delete-after if --delete at all. Then you
> just need the generator to be slightly more clever. For each file on
> the sender which is *missing* from the receiver, it needs to search
> the checksums of all of receiver's existing files for a checksum
> match. If it finds a match, it can simply use that matched file and
> either copy or move it to the new filename. Then that file just gets
> I don't think this would require any changes to sender, receiver or
> protocol. What I described would only handle
> rename-without-modification, but it's cost is not very high. I think
> it's O(N*M), N=# of files on sender that are missing on receiver, M=#
> of files on sender. That's the cost over and above whatever
> --checksum costs.
> I don't see how rename-with-modification could be handled efficiently,
> though. Better not to go there.
> If nobody says I'm way off base here, I might be inspired to try to
> implement this. Unless someone else has the time and inclination...
The first pass of "rename-without-modification" could even be much easier:
size and timestamp should match.
Eberhard Moenkeberg (emoenke at gwdg.de, em at kki.org)
More information about the rsync