rsync algorithm for large files

eharvey at lyricsemiconductors.com eharvey at lyricsemiconductors.com
Sat Sep 5 13:53:54 MDT 2009


Yup, by doing --inplace, I got down from 30 mins to 24 mins...  So that's
slightly better than resending the whole file again.



However, this doesn't really do what I was hoping to do.  Perhaps it can't
be done, or somebody would like to recommend some other product that is more
well suited for my purposes?



If I could describe ideally exactly what I'm trying to do, it would be ...

·         During initial send, calculate checksums on the fly, down to some
blocksize (perhaps 1Mb), and store the checksums for later use.

·         On subsequent sends, just read the source and compare checksums
against previously saved values, and only send the blocks needed.  In worst
case, all blocks have changed, and the time to send is very nearly equal to
the initial send.

·         The runtime for subsequent runs should never significantly exceed
the runtime of the initial.  Because the goal is to gain something over
brainless delete-and-overwrite.

·         The runtime for subsequent runs should be on the same order of
magnitude of:

o    Whichever is greater:

o    Calculate the checksums of the source
or

o    Send the changed blocks



In my specific situation, 33mins for the initial send of 20G across 100Mbit
lan, my subsequent run should be approx 11mins, because that’s how long it
takes for me to md5 the whole tree.



Thanks again for any assistance…
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20090905/184e035f/attachment.html>


More information about the rsync mailing list