[cifs-protocol] [MS-XCA] is LZ77 + Huffman the same as the Win32 compression API? - TrackingID#2210190040006868

Hung-Chun Yu HungChun.Yu at microsoft.com
Wed Oct 19 17:08:42 UTC 2022


[BCC dochelp]

Hi Douglas

Thank you for contacting Microsoft Open Specifications Support. We created SR Case - TrackingID#2210190040006868. Do leave this tag in the subject for future tracking.
One of our engineers will be contacting you shortly.

Hung-Chun Yu
Escalation Engineer
Microsoft Open Specifications

-----Original Message-----
From: Douglas Bagnall <douglas.bagnall at catalyst.net.nz> 
Sent: Wednesday, October 19, 2022 3:32 AM
To: Interoperability Documentation Help <dochelp at microsoft.com>; cifs-protocol at lists.samba.org
Subject: [EXTERNAL] [MS-XCA] is LZ77 + Huffman the same as the Win32 compression API?

Specifically, is it the same as produced by the functions described here:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fwindows%2Fwin32%2Fapi%2F_cmpapi%2F&data=05%7C01%7CHungChun.Yu%40microsoft.com%7Ca063610f8e6845628ec508dab1bd4001%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638017723983484589%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZPsmgfZeUcAXjohA0Ptd22dIOmtsaE%2B7%2B9NmEENpo3E%3D&reserved=0 .

when used with COMPRESS_ALGORITHM_XPRESS_HUFF | COMPRESS_RAW flags.

As far as I can tell, it might be for long strings, but for short ones, including the two examples in MS-XCA 3.2, the Compress API refuses to compress or decompress at all.

That is, if I ask it to compress the string consisting of 100 repetitions of "abc" (i.e. 300 bytes altogether), I get the exact same string as the compressed version, when using COMPRESS_ALGORITHM_XPRESS_HUFF | COMPRESS_RAW.

If I use COMPRESS_ALGORITHM_XPRESS_HUFF without COMPRESS_RAW, I see the same 300 bytes, with a 28 byte header (starting 0a 51 e5 c0 18 00, which does not seem to be a well known magic identifier).

The same thing happens for every shorter string I try (except zero bytes, which
fails) -- always the data is returned unchanged.

However, sequences of 101 or more repetitions of "abc", 303 or more bytes, compress to 263 bytes looking very similar to example in MS-XCA, as you would expect.

Do protocols that use MS-XCA LZ77/Huffman follow this same logic?


My tests were with a small C program compiled using Cygwin on Windows 2012r2. I can provide the source if that helps. In another question I wrote:

> Multi-block examples would of course be helpful.

which is what I was hoping to look at with this; instead I found I could not reproduce the short examples.


I will note that this behaviour is not entirely ridiculous. With the fixed 
overhead of the 256 byte Huffman table, few strings under 300 bytes can actually 
be compressed. And as the target size is necessary for LZ77+Huffman 
decompression, there *should* be no confusion as to how to decompress. For 
example, suppose you go through these steps:

  1. Compress 65536 zeros into a 263 byte file.
  2. Compress these 263 bytes into an identical 263 byte file.
  3. Decompress the two files, disambiguating by destination size.

In step 3, if you tell the decompressor you need 65536 bytes, it will give you 
the correct answer, of that many zeros. On the other hand, if you tell it you 
need 263 bytes, it will give you the correct answer, of those same 263 bytes.

That is not well explained in
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fwindows%2Fwin32%2Fapi%2Fcompressapi%2Fnf-compressapi-decompress&data=05%7C01%7CHungChun.Yu%40microsoft.com%7Ca063610f8e6845628ec508dab1bd4001%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638017723983484589%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PNN5i5K8AVP7%2Fpu%2FyBxmFVqSoif%2BeMCMBFScWyA22Ls%3D&reserved=0


cheers,
Douglas



More information about the cifs-protocol mailing list