Renaming a directory results in an expensive retransmission

N.J. van der Horn (Nico) nico at vanderhorn.nl
Sun Oct 7 13:51:50 GMT 2007


Thanks Matt, it seems that the last days there is a lot attention to
this subject.

My "pre-processor" approach helped a lot, but checksumming is very
CPU-intensive.
For that reason i sorted first on timestamp to determine which files
would normally be deleted,
thus minimizing the amount of files to be checksummed for this purpose.
This helped, but still takes on the central backup server a lot of
resources.

As the pre-processor is not a part of rsync itself, there is a lot of
work doubled by both,
i took this way just to see if my theory was right before making a proposal.

I will now start testing the new --detect-renamed option provided by the
patch
"patches/detect-renamed.diff" and trust this will be smarter than my
workaround.

Hope to report my conclusions before my vacation the 17th of october...

Nico

Matt McCutchen schreef:
> On 10/5/07, N.J. van der Horn (Nico) <nico at vanderhorn.nl> wrote:
>   
>> It is a tricky problem to deal with i think, it is tempting to keep a
>> checksum'd file/directory list on both sides with information like:
>>
>> * a fingerprint/signature/checksum to identify each file or directory
>> * inode number
>> * timestamp
>> * filesize
>>
>> In case a files appears to be deleted, because the name/path is changed,
>> it could possibly be identified by it's fingerprint and used to sync
>> cleverly ;-)
>> This in the thought of expanding --fuzzy, giving it more functionality
>> (hint).
>>
>> For some time i am experimenting with a solution to this problem, by
>> some sort
>> of a "preprocessor", that tries to identify in the described way, creating
>> hardlinks (ln) to let rsync think the files are already in the new location.
>>     
>
> The --detect-renamed option provided by the patch
> "patches/detect-renamed.diff" in the rsync source package does
> essentially this.
>
>   
>> The cost of keeping a database in this scenario would be truly justified
>> for me.
>>     
>
> Wayne is considering adding support for a file database, which would
> be used to make --detect-renamed work somewhat better:
>
> http://lists.samba.org/archive/rsync/2007-October/018780.html
>
> Matt
>
>
>   

-- 
Behandeld door / Handled by: N.J. van der Horn (Nico)
---
ICT Support Vanderhorn IT-works, www.vanderhorn.nl,
Voorstraat 55, 3135 HW Vlaardingen, The Netherlands,
Tel +31 10 2486060, Fax +31 10 2486061




More information about the rsync mailing list