Chance of equal checksum and changing blocks

lewis butler lbutler+rsync at
Fri Jan 23 04:40:47 GMT 2009

On 22-Jan-2009, at 02:43, David de Lama wrote:
> Hi @all!
> I have two questions:
> - First, am I right that the chance of getting the same 32-bit  
> rolling checksum is 1/2^16

Depends on the algorithm.  Most 32bit algorithms are not really 1: 

> and to get the same 128-bit MD5 Hash is 1/2^127?

The chances of two files accidentally generating the same MD5 hash are  
about 1:2^100, or about 1 in 12,676,506,000,000,000,000,000,000,000.  
Why it's not 1 in (2^127)-1 is complicated, and might be part of the  
reason for MD5's exploitability.  Enough said that MD5 does not use  
every possible potential hash.

Put another way, if you generate a new md5 hash ever *nanosecond*, it  
will take up to 40,170,281,500,000 YEARS to generate a collision (or  
about 3000 times long than the universe has existed).  To have a 50%  
chance of collision is considerably more likely (about 600 years if  
you generate one MD5 hash every nanosecond, or nearly 600,000,000,000  
years (c. 50 times longer than the universe has existed) at a more  
reasonable rate of 1 hash per second).

If your concern with MD5 is accidental collisions, then MD5 is  
perfectly fine; you're not going to get hash collisions. If your  
concern is security, then MD5 is not acceptable because it has known  

Advance and attack! Attack and destroy! Destroy and rejoice!

More information about the rsync mailing list