[cifs-protocol] [MS-XCA] LZ77 + Huffman: is sometimes slightly more than 64k encoded in as block?

Douglas Bagnall douglas.bagnall at catalyst.net.nz
Tue Nov 1 20:56:09 UTC 2022


hi Dochelp,

Is it ever the case that sometimes slightly more than 65536 bytes are encoded as 
a single block (i.e., using one Huffman table)?

I ask because I observe this behaviour with the user mode Windows Compression 
API, which I know is not covered by MS-XCA, but which purports to use the same 
algorithm.

As a specific example, when compressing a string of 65537 (i.e. 64k + 1) zeros, 
I get the following result:

The Huffman table is all zeros except bytes 0, 128, and 135, which are 0x02, 
0x02, and 0x10 respectively.

symbol  code    Huffman    meaning
0x00      2      10        literal zero
0x100     2      11        EOF
0x10f     1      0         match 1 back, length TBD (>17)

The remaining bytes are 00 98 00 00 ff fd ff.
The 0x98 is 10-0-11-000, encoding literal zero, the 0x10f match, then EOF.

The length for the match resolves to 0xfffd + 3, which is exactly 0x10000, or 
65536.

That all works very nicely, writing one zero, then copying it 65536 times for 
the result we want, but it breaks the rule that data is processed in 64k chunks.

As I read it, the way MS-XCA would handle this is to have two blocks. The first 
would look very much like the one described above, but with an 0xfc byte in 
place of 0xfd, indicating a total of 65536 zeros, and no EOF. The second would 
have only a single zero and EOF.

MS-XCA 2.1.4.3 does seem to suggest > 65536 bytes in a block when it says:

> Note that match distances cannot be larger than 65,535, and match lengths > cannot be longer than 65,538.

If the block is always 65536 or less, why mention lengths of 65,538?

The follow up question is going to be: how can a decoder know when the block 
length is greater than 64k?

cheers,
Douglas



More information about the cifs-protocol mailing list