Rsyncing really large files

Shachar Shemesh rsync at shemesh.biz
Mon Feb 28 10:23:50 GMT 2005


Lars Karlslund wrote:

> Also as far as I could read, the default block size is 700 bytes? What 
> kind of application would default to moving data around 700 bytes at a 
> time internally in a file? I'm not criticizing rsync, merely 
> questioning the functionality of this feature.

I believe you may have missed the point there. 700 bytes is not the 
amount the application is expected to have changed. 700 bytes is merely 
the unit of data examined as one block. This means that if you take a 
file and change it by one byte, 700 consecutive bytes (sometimes), or 
two bytes 698 bytes away from one another, rsync will treat it as a 
single changed block, and will resynchronize the data there. This number 
is a trade off.

The larger the number, the more bytes need to be synched if a single 
byte changes (more network traffic). Also, the larger the number, the 
higher the cost if a small change crosses a block boundary, but the 
lower the chances of that happening.

The smaller the number, the more checksums have to be calculated and 
transferred (more network traffic). Also, the smaller the number, the 
more blocks in a file, and the higher the chances of checksum collisions 
that do not stem from a truly identical block, resulting in the need to 
calculate a stronger hash for the block and transfer it (more IO, cpu, 
network load and latency).

I'm too new to this project to know what benchmarks were done to bring 
the block size to default to 700, but it seems like a nice number. If 
your characteristics vary, you may wish to play around with it.

As for granularity - supposed you added one byte to the file. This means 
that the sender has a file which has a one byte offset from the 
receiver. The sender will have one block for which there is no 
counterpart at the receiver, and all other blocks will have a one byte 
offset (which rsync will detect, and save the traffic). In short, we see 
that the 700 number has almost nothing to do with the application that 
the file belongs to.

          Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html



More information about the rsync mailing list