[cifs-protocol] [MS-XCA] is LZ77 + Huffman the same as the Win32 compression API? - TrackingID#2210190040006868

Wed Oct 19 21:13:55 UTC 2022

Hi Douglas:
The API in question is one implementation of MS-XCA. The decision to compress or not compress as well as adding a header is the decision of the designer of the API. 

SMB compression has its own decision logic about whether to compress or not to compress given data. Is your research purely for understanding the compression algorithm described in MS-XCA or are you studying it to implement compression in SMB?

SMB does not use the API that you mentioned for compression. It uses the following API
https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-rtlcompressbuffer

and SMB only uses COMPRESSION_ENGINE_STANDARD.

Also, when SMB compresses data, it attached a header as well to the compressed data. The header is documented in section " 2.2.42 SMB2 COMPRESSION_TRANSFORM_HEADER"

Please let me know if this does not answer your question.

Regards,
Obaid Farooqi
Escalation Engineer | Microsoft

-----Original Message-----
From: Hung-Chun Yu <HungChun.Yu at microsoft.com> 
Sent: Wednesday, October 19, 2022 12:09 PM
To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>; cifs-protocol at lists.samba.org
Cc: Hung-Chun Yu <HungChun.Yu at microsoft.com>
Subject: [MS-XCA] is LZ77 + Huffman the same as the Win32 compression API? - TrackingID#2210190040006868

[BCC dochelp]

Hi Douglas

Thank you for contacting Microsoft Open Specifications Support. We created SR Case - TrackingID#2210190040006868. Do leave this tag in the subject for future tracking.
One of our engineers will be contacting you shortly.

Hung-Chun Yu
Escalation Engineer
Microsoft Open Specifications

-----Original Message-----
From: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Sent: Wednesday, October 19, 2022 3:32 AM
To: Interoperability Documentation Help <dochelp at microsoft.com>; cifs-protocol at lists.samba.org
Subject: [EXTERNAL] [MS-XCA] is LZ77 + Huffman the same as the Win32 compression API?

Specifically, is it the same as produced by the functions described here:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fwindows%2Fwin32%2Fapi%2F_cmpapi%2F&data=05%7C01%7Cobaidf%40microsoft.com%7Ca6a1b99d4ed44213afe108dab1f4954e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638017961284355587%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bWn%2Fxc50%2BwICUeVN89WTgDRK8a7z4XdSIIazafxRQr4%3D&reserved=0 .

when used with COMPRESS_ALGORITHM_XPRESS_HUFF | COMPRESS_RAW flags.

As far as I can tell, it might be for long strings, but for short ones, including the two examples in MS-XCA 3.2, the Compress API refuses to compress or decompress at all.

That is, if I ask it to compress the string consisting of 100 repetitions of "abc" (i.e. 300 bytes altogether), I get the exact same string as the compressed version, when using COMPRESS_ALGORITHM_XPRESS_HUFF | COMPRESS_RAW.

If I use COMPRESS_ALGORITHM_XPRESS_HUFF without COMPRESS_RAW, I see the same 300 bytes, with a 28 byte header (starting 0a 51 e5 c0 18 00, which does not seem to be a well known magic identifier).

The same thing happens for every shorter string I try (except zero bytes, which
fails) -- always the data is returned unchanged.

However, sequences of 101 or more repetitions of "abc", 303 or more bytes, compress to 263 bytes looking very similar to example in MS-XCA, as you would expect.

Do protocols that use MS-XCA LZ77/Huffman follow this same logic?

My tests were with a small C program compiled using Cygwin on Windows 2012r2. I can provide the source if that helps. In another question I wrote:

> Multi-block examples would of course be helpful.

which is what I was hoping to look at with this; instead I found I could not reproduce the short examples.

I will note that this behaviour is not entirely ridiculous. With the fixed overhead of the 256 byte Huffman table, few strings under 300 bytes can actually be compressed. And as the target size is necessary for LZ77+Huffman decompression, there *should* be no confusion as to how to decompress. For example, suppose you go through these steps:

  1. Compress 65536 zeros into a 263 byte file.
  2. Compress these 263 bytes into an identical 263 byte file.
  3. Decompress the two files, disambiguating by destination size.

In step 3, if you tell the decompressor you need 65536 bytes, it will give you the correct answer, of that many zeros. On the other hand, if you tell it you need 263 bytes, it will give you the correct answer, of those same 263 bytes.

That is not well explained in
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fwindows%2Fwin32%2Fapi%2Fcompressapi%2Fnf-compressapi-decompress&data=05%7C01%7Cobaidf%40microsoft.com%7Ca6a1b99d4ed44213afe108dab1f4954e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638017961284355587%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xkHtIW0esLYaJTpezE3owIQja%2Bp4BuRTI4cdnBzfJd8%3D&reserved=0

cheers,
Douglas