[RFC] dynamic checksum size

Donovan Baarda abo at minkirri.apana.org.au
Mon Mar 24 18:32:04 EST 2003

```On Mon, 2003-03-24 at 16:33, jw schultz wrote:
> On Mon, Mar 24, 2003 at 10:56:42AM +1100, Donovan Baarda wrote:
> > On Mon, 2003-03-24 at 03:36, jw schultz wrote:
[..]
> > > hadn't a decent fast integer sqrt function at hand.  I don't
> > > really care to start bringing in the math lib just for this
> > > one calculation.  I've cobbled together one that is OK.
> > [...]
> >
> > In case you didn't notice the "count non-zero shifts" is an approximate
> > integer log2 function. There are integer approximations for sqrt that
> > you can use, including Newton-Raphson;

Actually, Newton-Raphson is not really an approximation (when compared
to "count non-zero shifts"), but you can turn it into one by stopping
iteration when the answer is close enough.

> > int sqrt(int R) {
> >   int x,x1;
> >
> >   x = 1024; /* starting guess close to what we expect */
> >   repeat {
> >     x1 = x;
> >     x = (x1 + R/x1) >> 1;
> >   } while (x-x1);
> >   return x;
> > }
>
> I'm more concerned with cycle count and cache effects than
> loop iterations.  Division, at least historically, is an
> expensive operation compared to multiplication.

The above converges surprisingly quickly... for arbitrary numbers up to
2^36 it typically converges within 10 iterations. For numbers under 2^20
it seems to converge within 5 iterations. Also, this is only done once
per file to determine the block-size to use.

If you are really concerned about divisions, there is another algorithm
that uses only multiplies, and does one iteration per bit in the
solution... for a 64bit integer it is 32 iterations to find the
solution. It's nice for implementing in assembler on micros without a
hardware division, but it's a bit messier to implement in C.

> > If we must have an upper limit on block size, I'd prefer it to be
> > continuous up to the 4G mark. This means an upper limit of 64K.
>
> I'm ambivalent.  We don't seem to have a problem with
> huge block sizes.  One possibility would be to allow the user to
> set a ceiling by adding an option (--max-block-size)
>
> My inclination at the moment (may change my mind) would be
> to have no upper bound and see if anyone else has a problem.
> If problems appear we could either impose a limit or add a
> --max-block-size.

I think --max-block-size would be a waste of time... people can always
specify a particular block size if they are unhappy with the heuristic.

I suspect that no-limit will prove to work well.

> On the whole i'd say we are zeroing in on something good.
> I'll see about regenerating the patches.

Yeah... I think I'll have to implement this heuristic in pysync now :-)

--
----------------------------------------------------------------------