MD4 checksum_seed

Wed Mar 17 17:41:47 GMT 2004

Hi,

On 2004/03/17 00:07, Donovan Baarda wrote:

>>Quoth RFC 1950:
>>"That 65521 is prime is important to avoid a possible large class of
>>two-byte errors that leave the check unchanged."
>>Not much of a difference in adversarial setting, though.
> 
> And not much of a difference I think for rsync in general... I wonder just
> how large  the "possible large class" is?

I don't know, and I don't have the relevant papers at hand. But here's
an example, easily generalized: add 128 to the 513th byte and subtract
128 from the 512th byte (counting from the end and assuming no char
wrap-around).

>>>It might be possible to craft a mapping like xdelta's but using a
>>>special sequence calculated for best effect, much like the polynomials
>>>in crc32 and crc64.
>>
>>Possibly, for "natural" or random data (as opposed to adversarial). But
>>in such contexts, random choice is often close to optimal with high
>>probability.

Clarification: in "such contexts" I meant lookup tables, not polynomials
(you want to be really careful with the latter).

> I'll take your word for it :-). Actually I think the polynomials in crc are
> crafted for common line errors. I'm not sure you can do much "crafting"
> without making certain assumptions about the types of errors. In our case I
> don't think we can make assumptions about the types of "errors".

I fully concur.

>>>Have you looked at the librsync rollsum.c implementation in CVS? It
>>>replaced rs_calc_weak_sum and should be a fair bit faster.

Here are run times for 70000000 calculations over the same all-zero block.
"checksum.c" is librsync's current code (which differs from rsync's only
in the chars are signed).
"rollsum.c" is librsync's rollsumc.c from CVS.
"XDELTA" is checksum.c changed to add char-to-short lookup table
(I blindly converted every occurance of "buf[foo]" into
"lookup_table[buf[foo]]"; the lookup table is xdelta's).
Numbers in seconds; tested on a Pentium Xeon 1700MHz, gcc 3.2.3.

checksum.c -O2 CHAR_OFFSET=0:  273.42
checksum.c -O2 CHAR_OFFSET=31: 300.63
checksum.c -O3 CHAR_OFFSET=0:   31.42
checksum.c -O3 CHAR_OFFSET=31:  35.32
rollsum.c -O2 CHAR_OFFSET=0:    99.47
rollsum.c -O2 CHAR_OFFSET=31:   99.22
rollsum.c -O3 CHAR_OFFSET=0:   100.18
rollsum.c -O3 CHAR_OFFSET=31:   99.77
XDELTA -O2:                    386.07
XDELTA -O3:                     32.17

If you use the trivial one-char-at-a-time Fletcher algorithm (e.g.,
disable the first loop in lib/rsync's code and remember to initialize
i=0) you get the following:

checksum.c -O2 CHAR_OFFSET=0:  259.81
checksum.c -O2 CHAR_OFFSET=31: 259.41
checksum.c -O3 CHAR_OFFSET=0:  134.06
checksum.c -O3 CHAR_OFFSET=31: 137.44
XDELTA -O2:                    300.28
XDELTA -O3:                    135.31

Some things to observe:
* The fastest variant is checksum.c with -O3.
* The extra cost of xdelta-type lookups is acceptable for -O2 and
  virtually nonexistant for -O3.
* Compared to checksum.c, rollsum.c.c is thrice faster
  when using -O2 and 3 thrice *slower* when using -O3.
* With the current default -O2, checksum.c's four-chars-at-a-time
  code is worse than the trivial code.
* Nonzero CHAR_OFFSET has a small but measurable cost.

BTW, in both rsync and librsync's checksum.c, the loop
    for (i = 0; i < (len-4); i+=4) {
has an off-by-one limit. It should be "i <= (len-4)".

  Eran