[clug] Anyone using 'snappy', Google's fast compression?

Thu May 12 01:41:22 MDT 2011

This is basically the same as compressing the input anyway. Most LZW
like algorithms do this internally - compare a block against the
dictionary and if its not there they emit a huffman code for "literal"
and then literally append the output. Your overall size will still be
enlarged by this code. If you take a file already compressed by gzip
and recompress it - its size will be enlarged by a constant - which is
the file header and essentially this literal escape code.

The problem with trying to optimize outside the algorithm is that you
are already going to the trouble of compressing the block anyway so
you paid the cpu price. Then you make the file format more complex by
adding additional escape codes for that special case and have more
code paths. I am not sure this buys very much at all. If this kind of
optimization was productive it would have been included inside the
algorithm (which it is already as mentioned above).

Michael.

On 12 May 2011 08:26, Mike Carden <mike.carden at gmail.com> wrote:
>> led me to this GOOG project:
>> <http://code.google.com/p/snappy/>
>
> Well I hadn't heard of it and it looks interesting.
>
> Slightly tangentially, I was reading today about the stream
> compression employed by LTO 5 tape libraries. It grabs a data block,
> caches it then has a stab at compressing it. Then it compares the
> compressed block to the original and writes out the smaller of the two
> to tape - avoiding the trap of making the data bigger if it was
> already compressed or is incompressible.
>
> This is probably old hat to anyone who has worked with the guts of
> compression implementations before, but I was struck by its simplicity
> and usefulness.
>
> --
> MC
> --
> linux mailing list
> linux at lists.samba.org
> https://lists.samba.org/mailman/listinfo/linux
>