[PATCH] Compressed output files
Ph. Marek
marek at bmlv.gv.at
Tue Jul 2 23:53:02 EST 2002
On Wednesday 03 July 2002 01:51, Joel Votaw wrote:
> Attached is a patch that implements compressing output files as they're
> written to disk, uzing zlib. Thus far I've only used it with
> synchronizing directories on a single machine.
...
> - Added an option "--ignore-sizes", since there is no easy way for
> the receiver to know the uncompressed size of the files it
> already has. For now you have to use --checksum to be sure...
>
> - Find a way to write down the uncompressed file sizes on the
> receiving side. Perhaps the least-bad way to do this would be
> append some rsync-specific data, including uncompressed size, to
> the end of the gzip'd files. The receiver could read this in on
> future runs when it needed to. Gunzip'ing the file from the
> command-line would work but would give a "ignoring trailing
> garbage" kind of error.
The GZIP Standard http://www.faqs.org/rfcs/rfc1952.html defines
the field ISIZE:
This contains the size of the original (uncompressed) input
data modulo 2^32.
I'd expect that zlib sets that data and has a way to read this?
> What I've done so far isn't pretty, but I thought I'd send it in in case
> someone else finds it useful.
It's amazing. I'll have a use for that if the problem with the sizes is solved
- completly unzipping and checksumming the file doesn't make sense for local
file systems.
BTW: we might even save the MD4 checksum of the original file
in a gzip field. See the RFC:
XLEN (eXtra LENgth)
If FLG.FEXTRA is set, this gives the length of the optional
extra field. See below for details.
Maybe even the checksums of the individual blocks - but that's depending on
the blocksize (which can vary with every invocation), and needs some space.
If there would be a way to decompress only from the middle of the file it
could make sense, as we wouldn't have to unzip the complete file just to send
the checksums over the wire ...
Maybe we should save the blocksize which was used on generating and the
resulting block checksums. So the only time we would unzip is to actually
send portions of the file.
At least the MD4 checksum would be ok, I guess.
Regards,
Phil
More information about the rsync
mailing list