GZIP, ZIP, ISO, RPM files and rsync, tar, cpio

Martin Pool mbp at sourcefrog.net
Fri Aug 29 15:22:59 EST 2003


On 28 Aug 2003 jw schultz <jw at pegasys.ws> wrote:

> On Thu, Aug 28, 2003 at 12:51:16PM +0300, Sviatoslav Sviridov/Lintec
> Project wrote:
> > 
> > Sorry for direct reply, but mail server at samba.org blocks my
> > messages.

Can you please tell me exactly what error you get, and preferably
forward to me an example bounce?  If sending it to me does not work
either then please send it to jw and he can forward it to me.

> > > Finally, Most distribution ISOs use package formats, such as
> > > RPM, that compress the package contents.  These compressed
> > > packages may even if the installed fileset is unchanged
> > > contain bits of meta-data that have been updated impacting
> > > the rsyncabilty of the package file.  In any case changing
> > > even one internal file of a compressed package can disrupt
> > > rsyncing the entire package file.  The only possible
> > > amelioration of this would be the use of the gzip
> > > --rsyncable option (which requires a patched gzip) by the
> > > package builders--assuming they use gzip for package
> > > compression.  Given the effect of improving rsyncability and
> > > thereby reducing bandwidth requirements such a change to
> > > their package build scripts could well be to their
> > > advantage.
> > 
> > BTW, is there patch for bzip2 that adds --rsyncable option? Or may
> > bw someone working on it?
> 
> I don't expect so.
> 
> The --rsyncable patch for gzip uses file content patterns to
> reset the compression algorithm so that even if you insert
> or delete data early in the file rsync can still find
> matching blocks.  Look at the patch for further details.
> 
> As far as i can tell from the manpage bzip2 is compresses
> data in fixed size blocks with a reset on block boundaries.
> This means that it is moderately rsyncable as long as you
> never insert or delete data.  You can change early data
> without affecting later blocks but only if the offsets of
> later blocks remain the same.  This does not lend it to an
> rsyncable patch.  This does mean that bzip2 is good for
> block oriented data such as database tablespace files and
> for files that are appended to but bzip2 would be
> undesireable for text, word processor, tar and other less
> structured files.

I think this is correct.

You could naively simulate this by just splitting your file into 900kB
chunks before compression.

-- 
Martin 



More information about the rsync mailing list