[PATCH] Compressed output files

jw schultz jw at pegasys.ws
Wed Jul 3 07:23:02 EST 2002


On Tue, Jul 02, 2002 at 05:51:58PM -0600, Joel Votaw wrote:
> 
> Attached is a patch that implements compressing output files as they're
> written to disk, uzing zlib.  Thus far I've only used it with
> synchronizing directories on a single machine.

This certainly would be useful once reliable.  Be handy for
dirvish and other backup tools.

> 
> What seems to work / what's done:
> 
> 	- Synchronizing directories with all files in the target
> 	  directory gzip'd.  Files seem to contain the correct data.  Use
> 	  the option "--gzip-dest".

Should also have a --gzip-src option to allow reciprocal
transfers.  Comments in the patch mention this i notice.

> 	- Only transferring files whose checksums are different.
> 	  Destination files are gunzip'd before their checksums are
> 	  calculated.
> 
> 	- Added an option "--ignore-sizes", since there is no easy way for
> 	  the receiver to know the uncompressed size of the files it
> 	  already has.  For now you have to use --checksum to be sure...

This option shouldn't be necessary once you extract the size
from the internal gzip file structure.

> 
> 	- Added gzio.c from the latest zlib distribution so we can call
> 	  gzwrite() etc.
> 
> What remains to be done / problems:
> 
> 	- Needs more testing, especially with remote clients / servers.
> 
> 	- Batch files are not compressed.

Huh?  Please explain what is a "batch" file and why it doesn't
get compressed.

> 
> 	- Reading compressed files should be implemented in a more generic
> 	  fashion, perhaps in map_file() and its cousins.  I started
> 	  working on this but saw that changing map_file() et al. could
> 	  have far reaching consequences, so I took the easy way out: I
> 	  just changed the one routine I cared about for now.
> 
> 	- Add documentation of new options to manpages etc.
> 
> 	- Find a way to write down the uncompressed file sizes on the
> 	  receiving side.  Perhaps the least-bad way to do this would be
> 	  append some rsync-specific data, including uncompressed size, to
> 	  the end of the gzip'd files.  The receiver could read this in on
> 	  future runs when it needed to.  Gunzip'ing the file from the
> 	  command-line would work but would give a "ignoring trailing
> 	  garbage"  kind of error.
> 
> What I've done so far isn't pretty, but I thought I'd send it in in case
> someone else finds it useful.
> 
> 	-Joel

It seems (to me) a reasonable start so far.
The comments show some foresight re bidirectional plans and
support for other compression libs and levels.

I don't know if i'd support multiple compression libs but if
you do might i suggest calling the option --zip-dest
and have it take an argument to specify the compression
library? ie --zip-dest (bzip2|gzip)[=(1-9)]

In any case i would make --gzip-dest take an optional
argument for specifying the compression level right away.
Also downgrade the default level to 6 as the speed penalty
for level 9 is seldom worth the marginal compression increase.

One extra issue to consider is that accidentally leaving off
the --gzip-* option would really mess things up (imagine
restoring /usr).  Long term a sanity check might be in order
with a way to override. 


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt




More information about the rsync mailing list