Rsyncing really large files

Mon Feb 28 18:33:52 GMT 2005

Wayne Davison wrote:

>However, you should be sure to have measured what is causing the
>slowdown first to know how much that will help.  If it is not memory
>that is swapping on the sender, it may be that the computing of the
>checksums in maxing out your CPU, and removing the caching of the
>remote checksums won't buy you as much as you think.  You could use some
>of the librsync tools (e.g. rdiff) to calculate how long various actions
>take on each system (i.e. try running rdiff on each system outputting to
>/dev/null to see how long the computing of the checksums takes).
>
>..wayne..
>  
>
Hi Wayne,

Excuse me if I'm talking utter nonsense here. I have only just now 
opened the code up and looked at it. It does seem, however, that there 
is a considerable optimization that can be performed here.

Correct me if I'm wrong, but it seems to me that the checksum matching 
code is at match.c, inside hash_search. Particularly, the "do...while" 
loop. It seems that the loop is there to scan the entire checksums list 
for each byte. Is that really the case? If so, we can probably make it 
much much (much much much) more efficient by using a hash table instead. 
We wouldn't even have to change the line protocol in any way.

Am I misreading the code?

             Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html