[cifs-protocol] [MS-XCA] LZ77 + Huffman: is sometimes slightly more than 64k encoded in as block? - TrackingID#2211010040007989
Kristian.Smith at microsoft.com
Tue Nov 1 22:33:23 UTC 2022
[DocHelp to Bcc]
Thank you for your request. The case number 2211010040007989 has been created for this inquiry. One of our team members will follow-up with you soon.
Support Escalation Engineer
Windows Open Spec Protocols
Office: (425) 421-4442
kristian.smith at microsoft.com
From: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Sent: Tuesday, November 1, 2022 1:56 PM
To: cifs-protocol at lists.samba.org; Interoperability Documentation Help <dochelp at microsoft.com>
Subject: [EXTERNAL] [MS-XCA] LZ77 + Huffman: is sometimes slightly more than 64k encoded in as block?
Is it ever the case that sometimes slightly more than 65536 bytes are encoded as a single block (i.e., using one Huffman table)?
I ask because I observe this behaviour with the user mode Windows Compression API, which I know is not covered by MS-XCA, but which purports to use the same algorithm.
As a specific example, when compressing a string of 65537 (i.e. 64k + 1) zeros, I get the following result:
The Huffman table is all zeros except bytes 0, 128, and 135, which are 0x02, 0x02, and 0x10 respectively.
symbol code Huffman meaning
0x00 2 10 literal zero
0x100 2 11 EOF
0x10f 1 0 match 1 back, length TBD (>17)
The remaining bytes are 00 98 00 00 ff fd ff.
The 0x98 is 10-0-11-000, encoding literal zero, the 0x10f match, then EOF.
The length for the match resolves to 0xfffd + 3, which is exactly 0x10000, or 65536.
That all works very nicely, writing one zero, then copying it 65536 times for the result we want, but it breaks the rule that data is processed in 64k chunks.
As I read it, the way MS-XCA would handle this is to have two blocks. The first would look very much like the one described above, but with an 0xfc byte in place of 0xfd, indicating a total of 65536 zeros, and no EOF. The second would have only a single zero and EOF.
MS-XCA 22.214.171.124 does seem to suggest > 65536 bytes in a block when it says:
> Note that match distances cannot be larger than 65,535, and match lengths > cannot be longer than 65,538.
If the block is always 65536 or less, why mention lengths of 65,538?
The follow up question is going to be: how can a decoder know when the block length is greater than 64k?
More information about the cifs-protocol