GZIP, ZIP, ISO, RPM files and rsync, tar, cpio

jw schultz jw at pegasys.ws
Fri Aug 29 15:35:07 EST 2003


On Fri, Aug 29, 2003 at 03:22:59PM +1000, Martin Pool wrote:
> On 28 Aug 2003 jw schultz <jw at pegasys.ws> wrote:
> 
> > On Thu, Aug 28, 2003 at 12:51:16PM +0300, Sviatoslav Sviridov/Lintec
> > Project wrote:
> > > 
> > > Sorry for direct reply, but mail server at samba.org blocks my
> > > messages.
> 
> Can you please tell me exactly what error you get, and preferably
> forward to me an example bounce?  If sending it to me does not work
> either then please send it to jw and he can forward it to me.

Will do.

> > > BTW, is there patch for bzip2 that adds --rsyncable option? Or may
> > > bw someone working on it?
> > 
> > I don't expect so.
> > 
> > The --rsyncable patch for gzip uses file content patterns to
> > reset the compression algorithm so that even if you insert
> > or delete data early in the file rsync can still find
> > matching blocks.  Look at the patch for further details.
> > 
> > As far as i can tell from the manpage bzip2 is compresses
> > data in fixed size blocks with a reset on block boundaries.
> > This means that it is moderately rsyncable as long as you
> > never insert or delete data.  You can change early data
> > without affecting later blocks but only if the offsets of
> > later blocks remain the same.  This does not lend it to an
> > rsyncable patch.  This does mean that bzip2 is good for
> > block oriented data such as database tablespace files and
> > for files that are appended to but bzip2 would be
> > undesireable for text, word processor, tar and other less
> > structured files.
> 
> I think this is correct.
> 
> You could naively simulate this by just splitting your file into 900kB
> chunks before compression.

There would also be a tradeoff between rsync efficiency and
compression rate in using smaller blocks via the bzip2
options -1 .. -8.  All dependant on the file modification
patterns.  Given such large block sizes in bzip2 rsync block
allignment is unlikely to be an issue.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt



More information about the rsync mailing list