Rsyncing really large files
rsync at shemesh.biz
Sat Mar 5 18:10:37 GMT 2005
Wayne Davison wrote:
>On Sat, Mar 05, 2005 at 02:18:01PM +0200, Shachar Shemesh wrote:
>>>However, if you choose to start with the 32 bit rolling hash and mod
>>>that, you will have problems. The rolling checksum has two distinct
>>>parts, and modding will only pull info from the low order bits,
>>Why? This may be something I missed within the code.
>He's talking about potentially losing an even distribution of values if
>the lowest order bits aren't random enough. I think we're all only
>guessing that it might cluster too much if you just take the value%N
>(at least I haven't tried to look at this aspect of the checksum).
Even for N which is not a power of 2? Don't forget that if you do my
suggestion ( (numbuckets/8+1)*10 ) it rarely is going to be a power of
two. I can even say, without a doubt, that it is guarenteed NOT to be a
power of 2. As such, the high order bits do affect the bucket you end up in.
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html
More information about the rsync