Rsyncing really large files

Wayne Davison wayned at samba.org
Sat Mar 5 20:41:10 GMT 2005


On Sat, Mar 05, 2005 at 08:07:20PM +0200, Shachar Shemesh wrote:
> Definitely not [block-size]! I was talking about the hash table load. 

Ah yes, I can see that I misread what the "it" was referring to.  Thanks
for the clarification.

> But [the bitmap lookup] only works if the checksum function and the
> hash table are exactly the same size.

No, because you can tolerate some false positives -- some values that
say they are there in the bitmap, but aren't really there in the table
lookup.

> But how will you find it there? If you are going to have 740K blocks 
> (i.e. - 740,000 strong hashes) in a 16bit hash table, you are going to 
> have lots of collisions there (190 per bucket, on average), and you 
> gained nothing.

You'll note that a no-match bitmap has no effect on the normal hash
table lookups, so you're railing against something that is not even
discussed in my email.  I just mentioned a potential improvement for
the checksum lookup algorithm, but one that is only really useful if
there are a lot of not-found blocks.  I remain unconvinced that this
will be a net win, but I thought I'd mention it in case anyone wanted
to run some tests.

..wayne..


More information about the rsync mailing list