rsync algorithm for large files

Shachar Shemesh shachar at shemesh.biz
Fri Sep 4 23:55:54 MDT 2009


eharvey at lyricsemiconductors.com wrote:
>
> I thought rsync, would calculate checksums of large files that have 
> changed timestamps or filesizes, and send only the chunks which 
> changed.  Is this not correct?  My goal is to come up with a 
> reasonable (fast and efficient) way for me to daily incrementally 
> backup my Parallels virtual machine (a directory structure containing 
> mostly small files, and one 20G file)
>
>  
>
> I’m on OSX 10.5, using rsync 2.6.9, and the destination machine has 
> the same versions.  I configured ssh keys, and this is my result:
>
Upgrade to rsync 3 at least.

Rsync keeps a hash of the blocks of sliding hashes. For older versions 
of rsync, the has was of a constant size. This meant that files over 3GB 
in size had a high chance of hash collisions. For a 20G file, the 
collisions alone might be the cause of your trouble.

Newer rsyncs detect when the hash gets too big, and increase the has 
size accordingly, thus avoiding the collisions.

In other words - upgrade both sides (but specifically the sender).

Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20090905/aebb747c/attachment.html>


More information about the rsync mailing list