Rsyncing really large files

Shachar Shemesh rsync at shemesh.biz
Mon Feb 28 18:05:36 GMT 2005


Lars Karlslund wrote:

> Maybe I didn't express myself thoroughly enough  :-)

Or me.

> Yes, a block is a minimum storage unit, which is considered for transfer.

In size, yes. Not in position.

> But it's a fact that the rsync algorithm as it is now checks to see if 
> a block should have moved. And in that case, the 700 bytes default is 
> very much worth considering.

No, because the rsync algorithm can detect single byte moves of this 700 
bytes block.

> If no blocks at all move in a 700 byte increment (i.e. 700 bytes gets 
> inserted somewhere - optimally at a 700-byte boundary in the file), 
> then all you get is larger memory and CPU usage and

all the bandwidth reduction you need.

The point I think you are missing is that the 700 bytes block need not 
be on 700 bytes boundaries. They can be on one byte boundaries.

It may very well be that, for your specific application, increasing the 
block size considerably will be better. If your files are huge, and the 
changed areas are very small in comparison to the file size, that can 
yield significant improvement. However, this is due to the trade offs I 
talked about in my previous email. It has nothing to do with 700 bytes 
being unrealistic or incorrect.

> True, and in that scenario it makes no difference what the block size 
> you choose: if the one byte is inserted at the beginning, the entire 
> file will be transferred.

No, just the first block.

> Rsync is not diff, and does not "patch" the file dynamically if the 
> file has random insertions/removals.

Well, in a way, it does. It's really quite ingenious. As I have no 
relation to it's implementation, I can say that whole heartily. I 
encourage you to read the about the algorithm on the site.

> You make no comment on my calculations on the block-moving algorithm 
> in my real-world scenario, which was the basis for this discussion anyway.

I'm sorry. You just stated as facts things I knew to be incorrect, so I 
allowed myself to skip your calculations. I don't think there is any 
argument that you are getting sub-optimal results from rsync. The 
question is "why".

How much memory is on the machines? Try to bring the block size up to 
1MB. This will mean you will have only 524 thousand blocks, which may 
prove more manageable.

> Best regards,

          Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html



More information about the rsync mailing list