Renaming a directory results in an expensive retransmission
N.J. van der Horn (Nico)
nico at vanderhorn.nl
Fri Oct 5 21:35:57 GMT 2007
We are using rsync for several years, but since a couple of months
we use it to backup remote servers, some with more than 200GB capacity.
Especially Windows users sometimes have the (bad) habit to change
the name of a directory with huge amounts of data below them.
We see the same nasty results as you are talking about:
* rsync "thinks" that the old directory name has disappeared, and deletes
the directory on the target machine, throwing away the expensive
* the new directory name initiates a fresh / full (re)transmission,
sometimes taking days.... while the "real work" would be done in
* the servers we backup have between 20GB and 200GB capacity.
* all rsync's are run in parallel, average sync time is 1.5 hour for 900GB.
* when a "user" behaves as described, it takes days to a week to resync.
It is a tricky problem to deal with i think, it is tempting to keep a
checksum'd file/directory list on both sides with information like:
* a fingerprint/signature/checksum to identify each file or directory
* inode number
In case a files appears to be deleted, because the name/path is changed,
it could possibly be identified by it's fingerprint and used to sync
This in the thought of expanding --fuzzy, giving it more functionality
For some time i am experimenting with a solution to this problem, by
of a "preprocessor", that tries to identify in the described way, creating
hardlinks (ln) to let rsync think the files are already in the new location.
I am traversing on both sides (remote and local) the directory trees,
a file with the information described above, but it is still work in
The cost of keeping a database in this scenario would be truly justified
That rsync deletes the files in the old location is then no problem for
But.... i am just a user with needs... looking for a solution to a
hoping this can be solved by the clever developers ;-)
Maybe there is already a solution available, and we are chasing shadows ?
Frank Thomas schreef:
> Good day,
> I’ve got a question regarding the usage of rsync that I just cannot
> figure out. I’ve done a fare hunt for the answer, but I’m stumped.
> Here is the situation.
> I have two pc’s running linux and using rsync to perform a backup from
> server1 to server2. For example: rsync -avzr -e 'ssh
> -i/root/.ssh/id_rsa' --delete /home/samba/admin/software
> Let’s say I have a directory within rsync’s scope to sync called
> Rsync is run and directory1 is sync’ed from server1 to server2. Also,
> a file named File1 is sync’ed because it is in the directory being
> Server1 server2
> Directory1 Directory1
> File1 File1
> Now, let’s say a user comes and changes the name of the Directory1 on
> server1 to DirectoryNew, rsync performs the following actions:
> 1. rsync recognizes that Directory 1 is not on
> server1, but it is on server2, so it flags it and it’s contents for
> deletion on server2.
> 2. rsync recognizes that DirectoryNew is on server1,
> but not on server2, so it flags it and it’s contents for copying to
> 3. rsync performs these actions to make the two
> directories the same.
> This action is the simplest method of performing an rsync, but it
> would be nice to have rsync to be intelligent enough to recognize a
> name change but not an inode change on the source. So the action
> performed would be,
> 1. rsync recognizes that Directory1 is not on
> server1, but it’s inode still is. Rsync reads the new directory name
> and flags the name change from Directory1 to DirectoryNew on server1.
> 2. Rsync reads server2 and sees that Directory1
> exists, and flags a pending name change on server2 from Directory1 to
> 3. Name is changed on server2. No files or
> directories are deleted and re-transferred from source to destination
> as the structure under the directory has not changed.
> Why go through all this work? I’ve had personnel change a directory
> name that has several gigabytes of data in it without notifying me and
> at night, rsync tries to perform the directory and file dance and
> fails simply because the volume is so great. It would be nice to
> either, one, recognize a large discrepancy between the source and
> destination before anything occurs, by giving a message of amount of
> potential bytes that would be transferred, (this doesn’t work with
> dry-run option), or do the fancy dance by recognizing a name change
> over a deletion of a directory.
> *Frank Thomas*
Behandeld door / Handled by: N.J. van der Horn (Nico)
ICT Support Vanderhorn IT-works, www.vanderhorn.nl,
Voorstraat 55, 3135 HW Vlaardingen, The Netherlands,
Tel +31 10 2486060, Fax +31 10 2486061
More information about the rsync