Rsyncing really large files

Shachar Shemesh rsync at shemesh.biz
Sat Mar 5 18:10:37 GMT 2005


Wayne Davison wrote:

>On Sat, Mar 05, 2005 at 02:18:01PM +0200, Shachar Shemesh wrote:
>  
>
>>>However, if you choose to start with the 32 bit rolling hash and mod 
>>>that, you will have problems.  The rolling checksum has two distinct 
>>>parts, and modding will only pull info from the low order bits,
>>>      
>>>
>>Why? This may be something I missed within the code.
>>    
>>
>
>He's talking about potentially losing an even distribution of values if
>the lowest order bits aren't random enough.  I think we're all only
>guessing that it might cluster too much if you just take the value%N
>(at least I haven't tried to look at this aspect of the checksum).
>  
>
Even for N which is not a power of 2? Don't forget that if you do my 
suggestion ( (numbuckets/8+1)*10 ) it rarely is going to be a power of 
two. I can even say, without a doubt, that it is guarenteed NOT to be a 
power of 2. As such, the high order bits do affect the bucket you end up in.

>..wayne..
>  
>
       Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html



More information about the rsync mailing list