[cifs-protocol] [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks - TrackingID#2210140040006030

Douglas Bagnall douglas.bagnall at catalyst.net.nz
Wed Oct 26 21:09:58 UTC 2022


hi Jeff,

Thanks. Yes, I think you're understanding correctly, and that is a valid 
answer, and I *would* have happily accepted it, but in the meantime I 
have had the misfortune of re-reading MS-XCA again and again, and I now 
believe it contradicts this view.

In 2.1.4.3 there is:

   The following pseudocode demonstrates the encoding method.

        Write the 256-byte table of symbol bit lengths
        While there are more literals or matches to encode
           [[write bits per the algorithm, not in question here]]
        WriteBits(SymbolLength[256], SymbolCode[256])
        FlushBits()

This appears to be encoding a single block (there's one 256-byte table), 
and it ends with the FlushBits(), which is essentially the "ignore 
ghi..." in my example. However it also has a 
"WriteBits(SymbolLength[256], SymbolCode[256])", which I understand 
should only happen at the end of the last block.

I think it would be accurate to say this pseudocode "demonstrates the 
encoding method for a message of 65536 or fewer bytes", but is unclear 
for multi-block messages.


And in section 2.2.4 the main decompression pseudocode loop starts like:

   Loop until a decompression terminating condition
       Build the decoding table
       CurrentPosition = 256     // start at the end of the Huffman table
       NextBits = Read16Bits(InputBuffer + CurrentPosition)
       CurrentPosition += 2
       NextBits <<= 16
       NextBits |= Read16Bits(InputBuffer + CurrentPosition)
       CurrentPosition += 2
       ExtraBitCount = 16


which suggests that the bits "ghi..." are discarded because we are told 
implicitly in the text that Read16bits shifts input into a 32 bit 
register -- if we call it twice at the beginning of each block, whatever 
was in the register has to fall out the other end.

cheers,
Douglas


On 27/10/22 05:48, Jeff McCashland (He/him) wrote:
> Hi Douglas,
> 
> As I understand, each 64k block is processed separately. In other words, the first 64k block is LZ77 compressed, then the Huffman codes are constructed based on symbol frequency in that 64k. If, in your example, DEF ends the 64k block, then the subsequent ghi... will be processed with the second 64k block and Huffman table, and not dropped.
> 
> Am I understanding your question correctly?
> 
> Best regards,
> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft Protocol Open Specifications Team
> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: (UTC-08:00) Pacific Time (US and Canada)
> Local country phone number found here: http://support.microsoft.com/globalenglish | Extension 1138300
> 
> -----Original Message-----
> From: Jeff McCashland (He/him)
> Sent: Friday, October 14, 2022 1:58 PM
> To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>; cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) <scabrero at samba.org>
> Cc: Microsoft Support <supportmail at microsoft.com>
> Subject: RE: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks - TrackingID#2210140040006030
> 
> [Tom to BCC]
> 
> Hi Douglas,
> 
> I'll research this question and let you know what I learn.
> 
> Best regards,
> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft Protocol Open Specifications Team
> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: (UTC-08:00) Pacific Time (US and Canada) Local country phone number found here: http://support.microsoft.com/globalenglish | Extension 1138300
> 
> -----Original Message-----
> From: Tom Jebo <tomjebo at microsoft.com>
> Sent: Friday, October 14, 2022 9:45 AM
> To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>; cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) <scabrero at samba.org>
> Cc: Microsoft Support <supportmail at microsoft.com>
> Subject: RE: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks - TrackingID#2210140040006030
> 
> [dochelp to bcc]
> [casemail cc]
> 
> Hi Douglas,
> 
> Thank you for your request. One of the Open Specifications team will respond to start working with you. I have created case 2210140040006030 and added the number to the subject of this email. Please refer to this case number in future communications regarding this issue.
> 
> Best regards,
> Tom Jebo
> Sr Escalation Engineer
> Microsoft Open Specifications
> 
> -----Original Message-----
> From: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
> Sent: Thursday, October 13, 2022 9:57 PM
> To: Interoperability Documentation Help <dochelp at microsoft.com>; cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) <scabrero at samba.org>
> Subject: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks
> 
> hi Dochelp,
> 
> 
> Does the beginning of the second and subsequent blocks break the bitstream, starting again at a byte boundary after the new Huffman table?
> 
> The question is best explained by analogy to the way long lengths are handled in matches. Suppose we have a match symbol in the middle of a bitstream, and the match is a long one, requiring the reading of an extra byte:
> 
>      ijklmnop  abcDEFgh [distance] qrs...
>                   |
>                   [match 1, 15]
> 
> Here abc, ghi.. are the sequence of bits in the stream around the match DEF, which is read in alternating bytes by little-endian rules, and the distance is plonked in the middle of the stream as an individual byte. The stream just flows around it, so gh-ijklmnop are interpreted after [distance].
> 
> Now, if DEF instead ended the block:
> 
>      ijklmnop  abcDEFgh [new Huffman table] qrs...
>                   |
>                   [ends the block (64k)]
> 
> 
> would the bits gh-jklmnop be interpreted using the new Huffman table, as part of the new block, or would those bits be dropped?
> 
> Multi-block examples would of course be helpful.
> 
> 
> Douglas
> 
> 




More information about the cifs-protocol mailing list