Rsyncing really large files
rsync at shemesh.biz
Mon Feb 28 18:05:36 GMT 2005
Lars Karlslund wrote:
> Maybe I didn't express myself thoroughly enough :-)
> Yes, a block is a minimum storage unit, which is considered for transfer.
In size, yes. Not in position.
> But it's a fact that the rsync algorithm as it is now checks to see if
> a block should have moved. And in that case, the 700 bytes default is
> very much worth considering.
No, because the rsync algorithm can detect single byte moves of this 700
> If no blocks at all move in a 700 byte increment (i.e. 700 bytes gets
> inserted somewhere - optimally at a 700-byte boundary in the file),
> then all you get is larger memory and CPU usage and
all the bandwidth reduction you need.
The point I think you are missing is that the 700 bytes block need not
be on 700 bytes boundaries. They can be on one byte boundaries.
It may very well be that, for your specific application, increasing the
block size considerably will be better. If your files are huge, and the
changed areas are very small in comparison to the file size, that can
yield significant improvement. However, this is due to the trade offs I
talked about in my previous email. It has nothing to do with 700 bytes
being unrealistic or incorrect.
> True, and in that scenario it makes no difference what the block size
> you choose: if the one byte is inserted at the beginning, the entire
> file will be transferred.
No, just the first block.
> Rsync is not diff, and does not "patch" the file dynamically if the
> file has random insertions/removals.
Well, in a way, it does. It's really quite ingenious. As I have no
relation to it's implementation, I can say that whole heartily. I
encourage you to read the about the algorithm on the site.
> You make no comment on my calculations on the block-moving algorithm
> in my real-world scenario, which was the basis for this discussion anyway.
I'm sorry. You just stated as facts things I knew to be incorrect, so I
allowed myself to skip your calculations. I don't think there is any
argument that you are getting sub-optimal results from rsync. The
question is "why".
How much memory is on the machines? Try to bring the block size up to
1MB. This will mean you will have only 524 thousand blocks, which may
prove more manageable.
> Best regards,
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
More information about the rsync