Rsyncing really large files
rsync at shemesh.biz
Mon Feb 28 18:33:52 GMT 2005
Wayne Davison wrote:
>However, you should be sure to have measured what is causing the
>slowdown first to know how much that will help. If it is not memory
>that is swapping on the sender, it may be that the computing of the
>checksums in maxing out your CPU, and removing the caching of the
>remote checksums won't buy you as much as you think. You could use some
>of the librsync tools (e.g. rdiff) to calculate how long various actions
>take on each system (i.e. try running rdiff on each system outputting to
>/dev/null to see how long the computing of the checksums takes).
Excuse me if I'm talking utter nonsense here. I have only just now
opened the code up and looked at it. It does seem, however, that there
is a considerable optimization that can be performed here.
Correct me if I'm wrong, but it seems to me that the checksum matching
code is at match.c, inside hash_search. Particularly, the "do...while"
loop. It seems that the loop is there to scan the entire checksums list
for each byte. Is that really the case? If so, we can probably make it
much much (much much much) more efficient by using a hash table instead.
We wouldn't even have to change the line protocol in any way.
Am I misreading the code?
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
More information about the rsync