[cifs-protocol] [MS-XCA] is LZ77 + Huffman the same as the Win32 compression API?

Douglas Bagnall douglas.bagnall at catalyst.net.nz
Wed Oct 19 10:32:24 UTC 2022


Specifically, is it the same as produced by the functions described here:

https://learn.microsoft.com/en-us/windows/win32/api/_cmpapi/ .

when used with COMPRESS_ALGORITHM_XPRESS_HUFF | COMPRESS_RAW flags.

As far as I can tell, it might be for long strings, but for short ones, 
including the two examples in MS-XCA 3.2, the Compress API refuses to compress 
or decompress at all.

That is, if I ask it to compress the string consisting of 100 repetitions of 
"abc" (i.e. 300 bytes altogether), I get the exact same string as the compressed 
version, when using COMPRESS_ALGORITHM_XPRESS_HUFF | COMPRESS_RAW.

If I use COMPRESS_ALGORITHM_XPRESS_HUFF without COMPRESS_RAW, I see the same 300 
bytes, with a 28 byte header (starting 0a 51 e5 c0 18 00, which does not seem to 
be a well known magic identifier).

The same thing happens for every shorter string I try (except zero bytes, which 
fails) -- always the data is returned unchanged.

However, sequences of 101 or more repetitions of "abc", 303 or more bytes, 
compress to 263 bytes looking very similar to example in MS-XCA, as you would 
expect.

Do protocols that use MS-XCA LZ77/Huffman follow this same logic?


My tests were with a small C program compiled using Cygwin on Windows 2012r2. I 
can provide the source if that helps. In another question I wrote:

> Multi-block examples would of course be helpful.

which is what I was hoping to look at with this; instead I found I could not 
reproduce the short examples.


I will note that this behaviour is not entirely ridiculous. With the fixed 
overhead of the 256 byte Huffman table, few strings under 300 bytes can actually 
be compressed. And as the target size is necessary for LZ77+Huffman 
decompression, there *should* be no confusion as to how to decompress. For 
example, suppose you go through these steps:

  1. Compress 65536 zeros into a 263 byte file.
  2. Compress these 263 bytes into an identical 263 byte file.
  3. Decompress the two files, disambiguating by destination size.

In step 3, if you tell the decompressor you need 65536 bytes, it will give you 
the correct answer, of that many zeros. On the other hand, if you tell it you 
need 263 bytes, it will give you the correct answer, of those same 263 bytes.

That is not well explained in
https://learn.microsoft.com/en-us/windows/win32/api/compressapi/nf-compressapi-decompress


cheers,
Douglas



More information about the cifs-protocol mailing list