"intelligent" rsync scripts?
Chris Shoemaker
c.shoemaker at cox.net
Wed Oct 26 18:04:34 GMT 2005
On Wed, Oct 26, 2005 at 03:02:51PM +0200, Tomasz Chmielewski wrote:
> I use rsync for backing up user data, profiles, important network shares
> etc. (from several locations over WAN).
>
> Overall it works flawlessly, as it transfers only changes, but sometimes
> there are some serious hiccups.
>
> Suppose this scenario, suppose it's 1 GB of files:
>
> user shares:
>
> /home/joe/data/file1
> /file2
> /...
> /file1000
>
> Now the user _moves_ that data to some other folder:
>
> /home/joe/WAN_goes_crazy/file1
> /file2
> /...
> /file1000
>
> ...and we start a backup process.
>
> rsync will first transfer data from "/home/joe/WAN_goes_crazy/file...",
> and then deletes "/home/joe/data/data...".
>
> Basically, this is how rsync works, but in the end, we transfer 1 GB of
> files over WAN that we already have locally - the only thing that
> changed was the folder where that data is.
>
> Is there some workaround for this (some intelligent script etc.)?
ISTM it would be quite useful to make rsync "rename-aware". Caveat: I
haven't hacked on rsync for quite a while, so my understand may be
wrong or outdated. But, I think this could be implemented thusly:
You'd want to make this optional, say --detect-renames, because it
does incur an extra processing cost. That option should imply at
least, --checksum and --delete-after if --delete at all. Then you
just need the generator to be slightly more clever. For each file on
the sender which is *missing* from the receiver, it needs to search
the checksums of all of receiver's existing files for a checksum
match. If it finds a match, it can simply use that matched file and
either copy or move it to the new filename. Then that file just gets
skipped.
I don't think this would require any changes to sender, receiver or
protocol. What I described would only handle
rename-without-modification, but it's cost is not very high. I think
it's O(N*M), N=# of files on sender that are missing on receiver, M=#
of files on sender. That's the cost over and above whatever
--checksum costs.
I don't see how rename-with-modification could be handled efficiently,
though. Better not to go there.
If nobody says I'm way off base here, I might be inspired to try to
implement this. Unless someone else has the time and inclination...
-chris
More information about the rsync
mailing list