[PATCH] Compressed output files
jw schultz
jw at pegasys.ws
Wed Jul 3 07:23:02 EST 2002
On Tue, Jul 02, 2002 at 05:51:58PM -0600, Joel Votaw wrote:
>
> Attached is a patch that implements compressing output files as they're
> written to disk, uzing zlib. Thus far I've only used it with
> synchronizing directories on a single machine.
This certainly would be useful once reliable. Be handy for
dirvish and other backup tools.
>
> What seems to work / what's done:
>
> - Synchronizing directories with all files in the target
> directory gzip'd. Files seem to contain the correct data. Use
> the option "--gzip-dest".
Should also have a --gzip-src option to allow reciprocal
transfers. Comments in the patch mention this i notice.
> - Only transferring files whose checksums are different.
> Destination files are gunzip'd before their checksums are
> calculated.
>
> - Added an option "--ignore-sizes", since there is no easy way for
> the receiver to know the uncompressed size of the files it
> already has. For now you have to use --checksum to be sure...
This option shouldn't be necessary once you extract the size
from the internal gzip file structure.
>
> - Added gzio.c from the latest zlib distribution so we can call
> gzwrite() etc.
>
> What remains to be done / problems:
>
> - Needs more testing, especially with remote clients / servers.
>
> - Batch files are not compressed.
Huh? Please explain what is a "batch" file and why it doesn't
get compressed.
>
> - Reading compressed files should be implemented in a more generic
> fashion, perhaps in map_file() and its cousins. I started
> working on this but saw that changing map_file() et al. could
> have far reaching consequences, so I took the easy way out: I
> just changed the one routine I cared about for now.
>
> - Add documentation of new options to manpages etc.
>
> - Find a way to write down the uncompressed file sizes on the
> receiving side. Perhaps the least-bad way to do this would be
> append some rsync-specific data, including uncompressed size, to
> the end of the gzip'd files. The receiver could read this in on
> future runs when it needed to. Gunzip'ing the file from the
> command-line would work but would give a "ignoring trailing
> garbage" kind of error.
>
> What I've done so far isn't pretty, but I thought I'd send it in in case
> someone else finds it useful.
>
> -Joel
It seems (to me) a reasonable start so far.
The comments show some foresight re bidirectional plans and
support for other compression libs and levels.
I don't know if i'd support multiple compression libs but if
you do might i suggest calling the option --zip-dest
and have it take an argument to specify the compression
library? ie --zip-dest (bzip2|gzip)[=(1-9)]
In any case i would make --gzip-dest take an optional
argument for specifying the compression level right away.
Also downgrade the default level to 6 as the speed penalty
for level 9 is seldom worth the marginal compression increase.
One extra issue to consider is that accidentally leaving off
the --gzip-* option would really mess things up (imagine
restoring /usr). Long term a sanity check might be in order
with a way to override.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync
mailing list