Rsyncing really large files

Lars Karlslund lak at pharmanord.com
Mon Feb 28 13:58:49 GMT 2005


On man, 2005-02-28 at 12:23 +0200, Shachar Shemesh wrote:

> > Also as far as I could read, the default block size is 700 bytes? What 
> > kind of application would default to moving data around 700 bytes at a 
> > time internally in a file? I'm not criticizing rsync, merely 
> > questioning the functionality of this feature.
> I believe you may have missed the point there. 700 bytes is not the 


Maybe I didn't express myself thoroughly enough  :-)

> amount the application is expected to have changed. 700 bytes is merely 
> the unit of data examined as one block. This means that if you take a 
> file and change it by one byte, 700 consecutive bytes (sometimes), or 
> two bytes 698 bytes away from one another, rsync will treat it as a 
> single changed block, and will resynchronize the data there. This number 
> is a trade off.


Yes, a block is a minimum storage unit, which is considered for
transfer.

But it's a fact that the rsync algorithm as it is now checks to see if a
block should have moved. And in that case, the 700 bytes default is very
much worth considering. If no blocks at all move in a 700 byte increment
(i.e. 700 bytes gets inserted somewhere - optimally at a 700-byte
boundary in the file), then all you get is larger memory and CPU usage
and no bandwidth reduction (which is what rsync is really about).


> As for granularity - supposed you added one byte to the file. This means 
> that the sender has a file which has a one byte offset from the 
> receiver. The sender will have one block for which there is no 
> counterpart at the receiver, and all other blocks will have a one byte 
> offset (which rsync will detect, and save the traffic). In short, we see 
> that the 700 number has almost nothing to do with the application that 
> the file belongs to.


True, and in that scenario it makes no difference what the block size
you choose: if the one byte is inserted at the beginning, the entire
file will be transferred. Rsync is not diff, and does not "patch" the
file dynamically if the file has random insertions/removals.

You make no comment on my calculations on the block-moving algorithm in
my real-world scenario, which was the basis for this discussion anyway.


Best regards,


-- 
Lars Karlslund <lak at pharmanord.com>
-------------- next part --------------
HTML attachment scrubbed and removed


More information about the rsync mailing list