[RFC] dynamic checksum size

jw schultz jw at pegasys.ws
Mon Mar 24 16:33:02 EST 2003


On Mon, Mar 24, 2003 at 10:56:42AM +1100, Donovan Baarda wrote:
> On Mon, 2003-03-24 at 03:36, jw schultz wrote:
> > On Mon, Mar 24, 2003 at 12:54:26AM +1100, Donovan Baarda wrote:
> > > On Sun, Mar 23, 2003 at 03:46:34AM -0800, jw schultz wrote:
> [...]
> > CHUNK_SIZE is used in mapping the file into memory.  I'm
> > not absolutely sure (haven't spelunked that part of the code) what
> > would happen if block size were to exceed it.  I suspect it
> > would be OK because i don't recall hearing anyone complain
> > as a result of using the --block-size option with a huge
> > value.
> 
> Did the old code limit it to 16K even after people manually set it to
> something larger?

No.  I just tried rsync with a block size of 100000 and it
seemed to be fine.

[snip]
> > hadn't a decent fast integer sqrt function at hand.  I don't
> > really care to start bringing in the math lib just for this
> > one calculation.  I've cobbled together one that is OK.
> [...]
> 
> In case you didn't notice the "count non-zero shifts" is an approximate
> integer log2 function. There are integer approximations for sqrt that
> you can use, including Newton-Raphson;
> 
> int sqrt(int R) {
>   int x,x1;
> 
>   x = 1024; /* starting guess close to what we expect */
>   repeat {
>     x1 = x;
>     x = (x1 + R/x1) >> 1;
>   } while (x-x1);
>   return x;
> }

I'm more concerned with cycle count and cache effects than
loop iterations.  Division, at least historically, is an
expensive operation compared to multiplication.

> > Clearly we are getting into the realms of sci-fi here.  Even
> > with the 16KB block length limit files in the 1-16GB range
> > should be manageable on almost any system that is likely
> > to have them.
> 
> If we must have an upper limit on block size, I'd prefer it to be
> continuous up to the 4G mark. This means an upper limit of 64K.

I'm ambivalent.  We don't seem to have a problem with
huge block sizes.  One possibility would be to allow the user to
set a ceiling by adding an option (--max-block-size)

My inclination at the moment (may change my mind) would be
to have no upper bound and see if anyone else has a problem.
If problems appear we could either impose a limit or add a
--max-block-size.

On the whole i'd say we are zeroing in on something good.
I'll see about regenerating the patches.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt


More information about the rsync mailing list