[PATCH] Compressed output files

Ph. Marek marek at bmlv.gv.at
Tue Jul 2 23:53:02 EST 2002


On Wednesday 03 July 2002 01:51, Joel Votaw wrote:
> Attached is a patch that implements compressing output files as they're
> written to disk, uzing zlib.  Thus far I've only used it with
> synchronizing directories on a single machine.
...
> 	- Added an option "--ignore-sizes", since there is no easy way for
> 	  the receiver to know the uncompressed size of the files it
> 	  already has.  For now you have to use --checksum to be sure...
>
> 	- Find a way to write down the uncompressed file sizes on the
> 	  receiving side.  Perhaps the least-bad way to do this would be
> 	  append some rsync-specific data, including uncompressed size, to
> 	  the end of the gzip'd files.  The receiver could read this in on
> 	  future runs when it needed to.  Gunzip'ing the file from the
> 	  command-line would work but would give a "ignoring trailing
> 	  garbage"  kind of error.
The GZIP Standard http://www.faqs.org/rfcs/rfc1952.html defines 
the field ISIZE:
	This contains the size of the original (uncompressed) input
	data modulo 2^32.
I'd expect that zlib sets that data and has a way to read this?

> What I've done so far isn't pretty, but I thought I'd send it in in case
> someone else finds it useful.
It's amazing. I'll have a use for that if the problem with the sizes is solved 
- completly unzipping and checksumming the file doesn't make sense for local
file systems.

BTW: we might even save the MD4 checksum of the original file
in a gzip field. See the RFC:
	XLEN (eXtra LENgth)
		If FLG.FEXTRA is set, this gives the length of the optional
		extra field.  See below for details.
Maybe even the checksums of the individual blocks - but that's depending on 
the blocksize (which can vary with every invocation), and needs some space.
If there would be a way to decompress only from the middle of the file it 
could make sense, as we wouldn't have to unzip the complete file just to send 
the checksums over the wire ...
Maybe we should save the blocksize which was used on generating and the 
resulting block checksums. So the only time we would unzip is to actually 
send portions of the file. 

At least the MD4 checksum would be ok, I guess.


Regards,

Phil






More information about the rsync mailing list