jw at pegasys.ws
Tue Jun 4 19:07:01 EST 2002
On Tue, Jun 04, 2002 at 05:43:17PM +1000, Kevin Easton wrote:
> > On Sat, Jun 01, 2002 at 05:18:42PM -0700, jw schultz wrote:
> > > On Sat, Jun 01, 2002 at 11:46:37PM +1000, Donovan Baarda wrote:
> > > > On Sat, Jun 01, 2002 at 04:57:15AM -0700, jw schultz wrote:
> > > > > On Sat, Jun 01, 2002 at 08:51:26PM +1000, Donovan Baarda wrote:
> > > > > > On Fri, May 31, 2002 at 05:25:15PM -0700, jw schultz wrote:
> > [...]
> Performing such a compression reset every N bytes of compressed text will
> clearly not work very well at all, because a change in the source text will
> usually change the number of bytes of compressed text output, and therefore the
> compresion resets when compressing the original and modified files will not
> happen in the same place relative to the unchanged text.
> (This is what JW stated the "gzip-rsyncable" patch does in an earlier post
> in this thread - however, this *isn't* what the patch actually appears to do).
When i said "I took a quick look at this patch and i think
it does..." my description should be taken with a box of
salt and is request for correction. Which you have finally
done and Rusty (a very busy fellow) has now confirmed.
Thanks for the correction.
> When I finally took the time to properly read Rusty's "gzip-rsyncable" patch
> while writing this mail, I discovered that it appears to use this same general
> technique, although the heuristic he has chosen is "the sum of the previous
> 4096 bytes of source text is divisible by 4096". I think my heuristic could
> allow the gzips to sync up faster (after 12 identical bytes of source text
> following a change, compared to 4096), whilst still having the same compression
> ratio hit (resets every 4096 bytes, on average), but I've only come up with
> this today so I haven't done as much thinking on the subject as Rusty -
> likely there is some "gotcha" that I haven't seen.
The one thing i wonder is how often in the real world do we
see sum(4k data) % 4096 == 0? Just an idle thought.
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync