great feature idea (well, hopefully)

Matthias Schniedermeyer ms at citd.de
Thu Feb 12 02:10:00 MST 2015


On 11.02.2015 14:03, QUBE RUBBIK wrote:
> Hello
> 
> I was just thinking about a killer feature for rsync, the ability to detect files name changes or move within the source and destination.
> At this time rsync has to re-transfer a file if it has been renamed or moved inside a subfolder, with a heavy waste of ressources and bandwidth.
> 
> It could be smarter :
> with a --smart switch, rsync could take a hash of every file within the source and destination BEFORE TRANSFERING, 
> then for existing (matching hash) files, it only needs to alter metadata (name, location, chmod etc...) saving plenty of bandwidth

Imagine doing that for a couple GB of data. The hashing might take 
longer than the time saved coping it.
This would only work with a persistence layer that remembers the hashes 
of unchanged files. This has been a topic in the past, altough i don't 
remember the details. (And i'm to lazy to google for it.)
Otherwise the only time it really saves time is when you have really 
asynchronous bandwithes:
Fast local access on both sides (to create the hashes), terrible 
bandwith on the link inbetween (for the coping of new/changed files)

> Okay destination has to handle this, I expect the rsync daemon has to handle server side file hashing.
> 
> We would have a clever tool to replicate data who only been reorganised with no changes on the files themselves.
> No need to resync the whole structure if you added a dir in the path, or someone renamed this particular heavy file
> 
> this may save big data on automatic backups, ftp mirrors etc...
> 
> 
> What do you think about it?

The 'workaround' i personally use are hardlinks. Just hardlink all files 
into a directory that sorts alphabetically before everything else, for 
me personally i use a '.z'-directory in the root of directory i treat 
that way.
That reason for that is rsync has to work through that directory first, 
otherwise it wouldn't work like intended.

After that you can move around the files and when you:
rsync ... -H --delete ... ...
rsync just deletes and re-hardlinks the moved file(s).

If you remove a file:
find .z -type f -links 1 -delete
removes the 'dangling' file(s) with only 1 link remaining.
(And in the meantime you have a backup, in case you accidentally deleted 
a file.)

You would also need to make plans for maintaing the .z-directory. 
Initial creating, adding new files, can files change? ...

The solution has some caveats, like maintaining the .z-directory, but it 
works fine for me.




-- 

Matthias


More information about the rsync mailing list