rsync2eran at tromer.org
Thu Mar 18 04:04:54 GMT 2004
On 2004/03/18 04:46, Donovan Baarda wrote:
> Why would rollsum.c be slightly _slower_ for CHAR_OFFSET=0?
Just measurement noise (background process kicking in and spoiling the
memory caches, or such). It sometimes errs slightly the other direction,
so I believe the two cases are identical.
>>* The extra cost of xdelta-type lookups is acceptable for -O2 and
>> virtually nonexistant for -O3.
> The optimiser must something pretty amazing to minimise the table
> lookups at compile-time; pretty impressive.
It's impossible to be completely eliminate the lookups. Indeed, looking
at gcc's intermediate assembler output, checksum.c's main loop is 22
instructions with plain CHAR_OFFSET=0 and 28 with xdelta lookups. My
guess is that checkum.c -O3 leaves some unused slots in the Xeon's
pipeline, and the extra lookups happen to fall nicely into those slots.
Same effect with a Pentium III Katmai 550MHz, BTW.
> The next question is; is the table lookup benefits enough to justify the
> extra complexity... given there is negligible performance hit?
> I don't think the lookups actually improve the probability of avoiding
> random collisions by much, and although they make it slightly more
> complicated, they don't make it much more robust against intentional
> attacks either. The only true protection against intentional attacks is
> randomising the seed. So do we really benefit much?
For intentional attacks consisting of injecting arbitrary block, xdelta
lookups indeed don't make any difference. But there is a difference for
more restricted attacks, or for data that happens to have the wrong
properties, especially when the xdelta lookup table is randomized. The
main benefit is breaking linearity, in the sense explained earlier
Another benefit is that xdelta handles small blocks much better: with
Fletcher, these don't only fail to use the full range of s1, but also
get more than their share of collisions for random data because the
distribution of s1 is highly concentrated around blocksize/2 (for large
blocks the wrap-arounds tend to smoothen the distribution).
You can build the random lookup table by applying a PRNG to the
(random!) checksum_seed, so that there's no further increase in delta
size. One way to to do this to apply MD4 to checksum_seed+i (or
checksum_seed concatenated with i, or whatever), for i=0,..,31, yielding
512 reasonably random bytes. That's negligible setup cost, once per run.
Oh, but please do increase checksum_seed to 16 bytes!
More information about the rsync