Compressed backup

jw schultz jw at
Tue Jun 4 19:07:01 EST 2002

On Tue, Jun 04, 2002 at 05:43:17PM +1000, Kevin Easton wrote:
> > On Sat, Jun 01, 2002 at 05:18:42PM -0700, jw schultz wrote:
> > > On Sat, Jun 01, 2002 at 11:46:37PM +1000, Donovan Baarda wrote:
> > > > On Sat, Jun 01, 2002 at 04:57:15AM -0700, jw schultz wrote:
> > > > > On Sat, Jun 01, 2002 at 08:51:26PM +1000, Donovan Baarda wrote:
> > > > > > On Fri, May 31, 2002 at 05:25:15PM -0700, jw schultz wrote:
> > [...]
> Performing such a compression reset every N bytes of compressed text will
> clearly not work very well at all, because a change in the source text will 
> usually change the number of bytes of compressed text output, and therefore the
> compresion resets when compressing the original and modified files will not
> happen in the same place relative to the unchanged text.
> (This is what JW stated the "gzip-rsyncable" patch does in an earlier post
> in this thread - however, this *isn't* what the patch actually appears to do).

When i said "I took a quick look at this patch and i think
it does..." my description should be taken with a box of
salt and is request for correction.  Which you have finally
done and Rusty (a very busy fellow) has now confirmed.
Thanks for the correction.

> When I finally took the time to properly read Rusty's "gzip-rsyncable" patch[1]
> while writing this mail, I discovered that it appears to use this same general
> technique, although the heuristic he has chosen is "the sum of the previous
> 4096 bytes of source text is divisible by 4096".  I think my heuristic could 
> allow the gzips to sync up faster (after 12 identical bytes of source text 
> following a change, compared to 4096), whilst still having the same compression
> ratio hit (resets every 4096 bytes, on average), but I've only come up with
> this today so I haven't done as much thinking on the subject as Rusty - 
> likely there is some "gotcha" that I haven't seen.

The one thing i wonder is how often in the real world do we
see sum(4k data) % 4096 == 0?  Just an idle thought.

	J.W. Schultz            Pegasystems Technologies
	email address:		jw at

		Remember Cernan and Schmitt

More information about the rsync mailing list