rsync speedup - how ?

Fri Aug 7 10:44:48 MDT 2009

> devzero at web.de wrote:
> > so, instead of 500M i would transfer 100GB over the network.
> > that`s no option.
> 
> I don't see how you came up with such numbers.
> If files change completely then I don't see why
> you would transfer more (or less) over the network.
> The difference that I'm thinking of is that
> by not using the rsync algorithm then you're
> substantially reducing the number of disk I/Os.

let me explain: all files are HUGE datafiles and they are of constant size.
they are harddisk-images and the contents being changed inside, i.e. 
specific blocks in the files being accessed and rewritten.

so, the question is:
is rsync rolling checksum algorithm the perfect (i.e. fastest) algorithm to match 
changed blocks at fixed locations between source and destination files ?
i`m not sure because i have no in depth knowledge of the mathematical background
in rsync algorithm. i assume: no - but it`s only a guess...

> The reason I say this, and I could be wrong since
> I'm no rsync algorithm expert, is because when the
> local version of a file and the remote version of
> a file are completely different, and the rsync
> algorithm is being used, the amount of I/O
> that must be done consists of the I/Os that
> compare the two files, plus the actual transfer
> of the bits from the source file to the destination
> file. (That's a very long sentence, isn't it.)
> Please correct this thinking if it's wrong.

yes, that`s correct. but what i`m unsure about is, if
rsync isn`t doing too much work with detecting the
differences. it doesn`t need to "look forth and back" (as i read 
somewhere it would) , it just need to check if block1 in filea differs 
from block1  in fileb.sorta stupid comparison without need for complex
math or any real "intelligence" to detect relocation of data.
see this post: http://www.mail-archive.com/backuppc-users@lists.sourceforge.net/msg08998.html

> > besides that, for transferring complete files i know faster methods than rsync.
> 
> Maybe so (I'd like to hear what you're referring to) but one reason
> I like to use rsync is that using the '-avzW' flags
> results in a perfect mirror on the destination, which is
> my goal. Do your faster methods have a way of doing that?

no, i have no faster replacement which is as good in perfect mirroring like 
rsync, but there are faster methods for transferring files.
here is some example: http://communities.vmware.com/thread/29721

> > one more question: 
> > how safe is transferring a 100gb file, i.e. as rsync
> > is using checksums internally to compare the contents
> > of two files, how can i calculate the risk of 2 files
> > being NOT perfectly in sync after rsync run ?
> 
> Assuming the rsync algorithm works correctly, I don't
> see any difference between the end result of copying
> a 100gb file with the rsync algorithm or without it.
> The only difference is the amount of disk and network
> I/O that must occur.

the rsync algorithm is using checksumming to find differences.
checksums are sort of "data reduction" which create a hash from
a larger amount of data. i just want to understand what makes
sure that there are no hash collisions which break the algorithm.
mind that rsync exists for some time and by that time file sizes
transferred with rsync may have grown by a factor of 100 or 
even 1000.  

regards
roland

________________________________________________________________
Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/