Rsyncing really large files
wayned at samba.org
Sat Mar 5 20:41:10 GMT 2005
On Sat, Mar 05, 2005 at 08:07:20PM +0200, Shachar Shemesh wrote:
> Definitely not [block-size]! I was talking about the hash table load.
Ah yes, I can see that I misread what the "it" was referring to. Thanks
for the clarification.
> But [the bitmap lookup] only works if the checksum function and the
> hash table are exactly the same size.
No, because you can tolerate some false positives -- some values that
say they are there in the bitmap, but aren't really there in the table
> But how will you find it there? If you are going to have 740K blocks
> (i.e. - 740,000 strong hashes) in a 16bit hash table, you are going to
> have lots of collisions there (190 per bucket, on average), and you
> gained nothing.
You'll note that a no-match bitmap has no effect on the normal hash
table lookups, so you're railing against something that is not even
discussed in my email. I just mentioned a potential improvement for
the checksum lookup algorithm, but one that is only really useful if
there are a lot of not-found blocks. I remain unconvinced that this
will be a net win, but I thought I'd mention it in case anyone wanted
to run some tests.
More information about the rsync