Rsyncing really large files

Lars Karlslund lak at pharmanord.com
Mon Feb 28 10:10:19 GMT 2005


Hi everyone,

Thank you for your replies.

On tor, 2005-02-24 at 17:59 +0100, Paul Slootman wrote:

> > It would certainly be possible to change the algorithm to not cache the
> > data (and thus only allow the current block to be compared), but I don't
> > think that idea has general enough interest for me to work on for
> > inclusion in rsync.  You might want to look into coding it up for
> > yourself.


I do understand the theoretical speed improvement by allowing rsync to
move blocks internally in a file, but I've failed to see how this
affects performance in real world.

Also as far as I could read, the default block size is 700 bytes? What
kind of application would default to moving data around 700 bytes at a
time internally in a file? I'm not criticizing rsync, merely questioning
the functionality of this feature.

Another question at arises is how is the lookup on matching blocks done?
I've set the blocksize to 64kbyte, which should generate 500gb/64kbyte =
8192000 checksums, and I'm guessing 32 bytes per checksum giving a
memory usage around 250MB (which is what I've registered when running
rsync).

Doing the lookups on 250MB of data 8192000 times must take some time,
right? If it isn't indexed, and the a block is never found, it would
have to scan the entire checksum list for every block. A computer with
DDR2100 memory would then take 270 hours only to scan the index in
memory.

Also the numbers speak for themselves, as the --whole-file option is
*way* faster than the block-copy method on our setup.


> I think that this would be useful enough in itself, e.g. when syncing
> database storage files. The chance that blocks move around (without
> changing) isn't that large. I've been considering something like that a
> while... Useful when syncing a 40GB database when there's mainly only
> insertions. I never had the time to persue it, though...


Hmm, unfortunately I've never honed my C-skills at all, since taking
classes in it ages ago. So I'd rather not get into doing a (probably
very ugly) patch for rsync by myself.

But right now I have a problem that I don't know how to solve, and rsync
is the piece of software that seems most mature and closest to solving
it.


-- 
Lars Karlslund <lak at pharmanord.com>
-------------- next part --------------
HTML attachment scrubbed and removed


More information about the rsync mailing list