compressed archives

jw schultz jw at pegasys.ws
Wed Mar 5 18:34:17 EST 2003


On Wed, Mar 05, 2003 at 06:12:50PM +1100, Christopher Vance wrote:
> Suppose I have a particular version of a largish compressed archive,
> most likely a .tgz or .tbz2, and that a remote machine has a newer,
> and only slightly different, version of the same archive, where most
> of the content hasn't actually changed much.  I might attempt to obtain
> a copy of the newer archive by first copying my local older copy to
> the newer name as a file to update from.
> 
> My understanding is that a small change in a file before compression
> can result in a large difference afterwards.
> 
> If rsync were to do its file stat and content comparisons on the
> uncompressed copy of both archives, might this not result in less
> network traffic (sending only the small changes) than just looking at
> the compressed copies?  (Yes, I realize that there are the additional
> (non-network) expenses of decompressing at both ends, and probably
> recompressing at the destination.)
> 
> My particular application is OS installation tarballs, but a number of
> bloated or huge software products out there have sourceballs where
> there might also be real savings.
> 
> Have I chosen the wrong tree to bark up?

Brad? has already mentioned the rsyncable patch to gzip
which resets the compressor in a content sensitive way so
that rsync can find matching blocks after a change in the
plaintext

 From what i can tell bzip2 compresses blocks so rsync
should work OK with that.

Even better, in my opinion, would be to compress the files
and then tar them (no tar compression).  It will get a
slightly lower compression rate but rsync will have more
matching blocks.

If you can afford the disk space you might consider not
compressing at all.  Save the compression for the actual
transfer with the rsync -z option  

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt


More information about the rsync mailing list