[RFC] dynamic checksum size
Donovan Baarda
abo at minkirri.apana.org.au
Mon Mar 24 18:32:04 EST 2003
On Mon, 2003-03-24 at 16:33, jw schultz wrote:
> On Mon, Mar 24, 2003 at 10:56:42AM +1100, Donovan Baarda wrote:
> > On Mon, 2003-03-24 at 03:36, jw schultz wrote:
[..]
> > > hadn't a decent fast integer sqrt function at hand. I don't
> > > really care to start bringing in the math lib just for this
> > > one calculation. I've cobbled together one that is OK.
> > [...]
> >
> > In case you didn't notice the "count non-zero shifts" is an approximate
> > integer log2 function. There are integer approximations for sqrt that
> > you can use, including Newton-Raphson;
Actually, Newton-Raphson is not really an approximation (when compared
to "count non-zero shifts"), but you can turn it into one by stopping
iteration when the answer is close enough.
> > int sqrt(int R) {
> > int x,x1;
> >
> > x = 1024; /* starting guess close to what we expect */
> > repeat {
> > x1 = x;
> > x = (x1 + R/x1) >> 1;
> > } while (x-x1);
> > return x;
> > }
>
> I'm more concerned with cycle count and cache effects than
> loop iterations. Division, at least historically, is an
> expensive operation compared to multiplication.
The above converges surprisingly quickly... for arbitrary numbers up to
2^36 it typically converges within 10 iterations. For numbers under 2^20
it seems to converge within 5 iterations. Also, this is only done once
per file to determine the block-size to use.
If you are really concerned about divisions, there is another algorithm
that uses only multiplies, and does one iteration per bit in the
solution... for a 64bit integer it is 32 iterations to find the
solution. It's nice for implementing in assembler on micros without a
hardware division, but it's a bit messier to implement in C.
> > If we must have an upper limit on block size, I'd prefer it to be
> > continuous up to the 4G mark. This means an upper limit of 64K.
>
> I'm ambivalent. We don't seem to have a problem with
> huge block sizes. One possibility would be to allow the user to
> set a ceiling by adding an option (--max-block-size)
>
> My inclination at the moment (may change my mind) would be
> to have no upper bound and see if anyone else has a problem.
> If problems appear we could either impose a limit or add a
> --max-block-size.
I think --max-block-size would be a waste of time... people can always
specify a particular block size if they are unhappy with the heuristic.
I suspect that no-limit will prove to work well.
> On the whole i'd say we are zeroing in on something good.
> I'll see about regenerating the patches.
Yeah... I think I'll have to implement this heuristic in pysync now :-)
--
----------------------------------------------------------------------
ABO: finger abo at minkirri.apana.org.au for more info, including pgp key
----------------------------------------------------------------------
More information about the rsync
mailing list