[cifs-protocol] [MS-XCA] is LZ77 + Huffman the same as the Win32 compression API?
Douglas Bagnall
douglas.bagnall at catalyst.net.nz
Wed Oct 19 10:32:24 UTC 2022
Specifically, is it the same as produced by the functions described here:
https://learn.microsoft.com/en-us/windows/win32/api/_cmpapi/ .
when used with COMPRESS_ALGORITHM_XPRESS_HUFF | COMPRESS_RAW flags.
As far as I can tell, it might be for long strings, but for short ones,
including the two examples in MS-XCA 3.2, the Compress API refuses to compress
or decompress at all.
That is, if I ask it to compress the string consisting of 100 repetitions of
"abc" (i.e. 300 bytes altogether), I get the exact same string as the compressed
version, when using COMPRESS_ALGORITHM_XPRESS_HUFF | COMPRESS_RAW.
If I use COMPRESS_ALGORITHM_XPRESS_HUFF without COMPRESS_RAW, I see the same 300
bytes, with a 28 byte header (starting 0a 51 e5 c0 18 00, which does not seem to
be a well known magic identifier).
The same thing happens for every shorter string I try (except zero bytes, which
fails) -- always the data is returned unchanged.
However, sequences of 101 or more repetitions of "abc", 303 or more bytes,
compress to 263 bytes looking very similar to example in MS-XCA, as you would
expect.
Do protocols that use MS-XCA LZ77/Huffman follow this same logic?
My tests were with a small C program compiled using Cygwin on Windows 2012r2. I
can provide the source if that helps. In another question I wrote:
> Multi-block examples would of course be helpful.
which is what I was hoping to look at with this; instead I found I could not
reproduce the short examples.
I will note that this behaviour is not entirely ridiculous. With the fixed
overhead of the 256 byte Huffman table, few strings under 300 bytes can actually
be compressed. And as the target size is necessary for LZ77+Huffman
decompression, there *should* be no confusion as to how to decompress. For
example, suppose you go through these steps:
1. Compress 65536 zeros into a 263 byte file.
2. Compress these 263 bytes into an identical 263 byte file.
3. Decompress the two files, disambiguating by destination size.
In step 3, if you tell the decompressor you need 65536 bytes, it will give you
the correct answer, of that many zeros. On the other hand, if you tell it you
need 263 bytes, it will give you the correct answer, of those same 263 bytes.
That is not well explained in
https://learn.microsoft.com/en-us/windows/win32/api/compressapi/nf-compressapi-decompress
cheers,
Douglas
More information about the cifs-protocol
mailing list