[RFC] dynamic checksum size
jw schultz
jw at pegasys.ws
Mon Mar 24 16:33:02 EST 2003
On Mon, Mar 24, 2003 at 10:56:42AM +1100, Donovan Baarda wrote:
> On Mon, 2003-03-24 at 03:36, jw schultz wrote:
> > On Mon, Mar 24, 2003 at 12:54:26AM +1100, Donovan Baarda wrote:
> > > On Sun, Mar 23, 2003 at 03:46:34AM -0800, jw schultz wrote:
> [...]
> > CHUNK_SIZE is used in mapping the file into memory. I'm
> > not absolutely sure (haven't spelunked that part of the code) what
> > would happen if block size were to exceed it. I suspect it
> > would be OK because i don't recall hearing anyone complain
> > as a result of using the --block-size option with a huge
> > value.
>
> Did the old code limit it to 16K even after people manually set it to
> something larger?
No. I just tried rsync with a block size of 100000 and it
seemed to be fine.
[snip]
> > hadn't a decent fast integer sqrt function at hand. I don't
> > really care to start bringing in the math lib just for this
> > one calculation. I've cobbled together one that is OK.
> [...]
>
> In case you didn't notice the "count non-zero shifts" is an approximate
> integer log2 function. There are integer approximations for sqrt that
> you can use, including Newton-Raphson;
>
> int sqrt(int R) {
> int x,x1;
>
> x = 1024; /* starting guess close to what we expect */
> repeat {
> x1 = x;
> x = (x1 + R/x1) >> 1;
> } while (x-x1);
> return x;
> }
I'm more concerned with cycle count and cache effects than
loop iterations. Division, at least historically, is an
expensive operation compared to multiplication.
> > Clearly we are getting into the realms of sci-fi here. Even
> > with the 16KB block length limit files in the 1-16GB range
> > should be manageable on almost any system that is likely
> > to have them.
>
> If we must have an upper limit on block size, I'd prefer it to be
> continuous up to the 4G mark. This means an upper limit of 64K.
I'm ambivalent. We don't seem to have a problem with
huge block sizes. One possibility would be to allow the user to
set a ceiling by adding an option (--max-block-size)
My inclination at the moment (may change my mind) would be
to have no upper bound and see if anyone else has a problem.
If problems appear we could either impose a limit or add a
--max-block-size.
On the whole i'd say we are zeroing in on something good.
I'll see about regenerating the patches.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync
mailing list