Rsyncing really large files
rsync at shemesh.biz
Mon Feb 28 10:23:50 GMT 2005
Lars Karlslund wrote:
> Also as far as I could read, the default block size is 700 bytes? What
> kind of application would default to moving data around 700 bytes at a
> time internally in a file? I'm not criticizing rsync, merely
> questioning the functionality of this feature.
I believe you may have missed the point there. 700 bytes is not the
amount the application is expected to have changed. 700 bytes is merely
the unit of data examined as one block. This means that if you take a
file and change it by one byte, 700 consecutive bytes (sometimes), or
two bytes 698 bytes away from one another, rsync will treat it as a
single changed block, and will resynchronize the data there. This number
is a trade off.
The larger the number, the more bytes need to be synched if a single
byte changes (more network traffic). Also, the larger the number, the
higher the cost if a small change crosses a block boundary, but the
lower the chances of that happening.
The smaller the number, the more checksums have to be calculated and
transferred (more network traffic). Also, the smaller the number, the
more blocks in a file, and the higher the chances of checksum collisions
that do not stem from a truly identical block, resulting in the need to
calculate a stronger hash for the block and transfer it (more IO, cpu,
network load and latency).
I'm too new to this project to know what benchmarks were done to bring
the block size to default to 700, but it seems like a nice number. If
your characteristics vary, you may wish to play around with it.
As for granularity - supposed you added one byte to the file. This means
that the sender has a file which has a one byte offset from the
receiver. The sender will have one block for which there is no
counterpart at the receiver, and all other blocks will have a one byte
offset (which rsync will detect, and save the traffic). In short, we see
that the 700 number has almost nothing to do with the application that
the file belongs to.
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
More information about the rsync