[cifs-protocol] [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks - TrackingID#2211140040009096

Jeff McCashland (He/him) jeffm at microsoft.com
Tue Nov 15 23:47:54 UTC 2022


Hi Douglas,

I completed my experiment and confirmed my understanding. 

When we find the end of the match, since we've exceeded the (first) 64k buffer, we finish out the block and Huffman-encode the LZ77 result. The next block consists of the <40 bytes that follow the end of the match. Since it's less than 40 bytes, we encode them all as literals, then Huffman-encode that LZ77 result. 

I believe this matches what you describe below as well. 

Let me know if you have any further questions on this issue. 

Best regards,
Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft Protocol Open Specifications Team 
Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: (UTC-08:00) Pacific Time (US and Canada)
Local country phone number found here: http://support.microsoft.com/globalenglish | Extension 1138300

-----Original Message-----
From: Jeff McCashland (He/him) 
Sent: Tuesday, November 15, 2022 9:38 AM
To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>; cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) <scabrero at samba.org>
Cc: Microsoft Support <supportmail at microsoft.com>
Subject: RE: [cifs-protocol] [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks - TrackingID#2211140040009096

Hi Douglas,

I went ahead and moved this to a new SR (2211140040009096) to track this issue as it evolves, and I'll close SR 2210140040006030. 

I assume point 1 below should be '...over the end of the block...' It sounds like you've made progress in your understanding of how the blocks work. 

I'm going to do my own experiment with a large file, large match that extends to within 40 bytes of the EOF, with non-matching characters on the end. I want to walk through the compression and see exactly how that works in the code. 

I'll let you know what I find. 

Best regards,
Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft Protocol Open Specifications Team
Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: (UTC-08:00) Pacific Time (US and Canada) Local country phone number found here: http://support.microsoft.com/globalenglish | Extension 1138300

-----Original Message-----
From: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Sent: Monday, November 14, 2022 8:09 PM
To: Jeff McCashland (He/him) <jeffm at microsoft.com>; cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) <scabrero at samba.org>
Cc: Microsoft Support <supportmail at microsoft.com>
Subject: Re: [cifs-protocol] [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks - TrackingID#2210140040006030

What I observe is two separate things:

1. If a match extends over the end of a match, the block ends there.

2. If a block is shorter than 41 bytes long, it is encoded as literals. A block can only be short if it is the last block.

The latter is probably an implementation detail -- the decompressor will not reject short blocks that contain matches (right?). It's sensible because it avoids the overhead of preparing the hash tables, and it costs little in terms of compression as we are already committed to at least 32 bits. I already had something like this in Samba, but with the threshold at 15.

These are the file sizes I see with 65559 "a"s and varying runs of "b" in the last block, from the Windows compression API:

  1    527
20    529
39    531
40    531
41    528    <- got smaller, because it uses a match.
42    528


Also with *65535* "a"s followed by various runs of "b", so that the first block ends with a solitary literal "b":

  1  263    <- all fits in one block
  2  523
20  525
39  527
40  527
41  527
42  524   <- got smaller, because 41 "b"s in the second block

With the compression level turned up, matches are used for the shorter blocks, which also indicates to me this is probably not a protocol level distinction.

I don't know if this is what you meant all along, but ending of the over-long block and the 40 byte thing got entangled in my mind, and I thought it was about finishing off the stream with 40 literals in the over-long block.

cheers,
Douglas




On 15/11/22 12:00, Douglas Bagnall via cifs-protocol wrote:
> hi Jeff,
> 
> I can upload to the workspace, anonymously, I just can't ever log in.
> I'm working on getting together a public repository of test vectors.
> 
> Here's one for now, as a Python expression:
> 
>   "a" * 65559 + "bb"
> 
> This runs over the block end (65559 > 65536) and there are 2 extra 
> bytes at the end. Thus I'd expect something like this:
> 
> 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00000030  20 02 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | 
> ...............|
> 00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00000080  02 00 00 00 00 00 00 20  00 00 00 00 00 00 00 00  |....... 
> ........|
> 00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00000100  80 35 00 00 ff 00 00 13  00 01 00
> |.5.........| 0000010b
> 
> There the Huffman tree has length 2 codes for "a", "b", EOF, and 271 
> (match 1 back, >17 long). The word at 0x100 is 0x3580, which is
> '00-11-01-01 10...', meaning "a", match, "b", "b", EOF. Then the "ff
> 00 00 13 00 01 00" is the length, which resolves to 0x10013 + 3 = 65558.
> 
> Instead with the Windows Compression API, I see:
> 
> 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00000030  10 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> 00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00000080  00 00 00 00 00 00 00 10  00 00 00 00 00 00 00 00
> |................|
> 00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00000100  00 40 00 00 ff 00 00 13  00 01 00 00 00 00 00 00
> |. at ..............|
> 00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00000130  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00
> |................|
> 00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00000180  00 00 00 00 00 00 00 00  00 00 00 01 00 00 00 00
> |................|
> 00000190  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> |................|
> *
> 00000200  00 00 00 00 00 00 00 00  00 00 00 00 20 00 00
> |............ ..| 0000020f
> 
> That's got two blocks. The first block has two codes, "a" and 271, 
> each one bit long, used once each in that order. The length is the 
> same as above. The second block starts on the 0x10b byte and has one bit codes for "b" and EOF.
> 
> My understanding of "if the match ends within 40 bytes of the Input 
> buffer end, we encode the remaining bytes as literals" is that we'd 
> use the existing Huffman table. Is that not the case?
> 
> Douglas
> 
> 
> 
> 
> On 15/11/22 09:57, Jeff McCashland (He/him) wrote:
>> Hi Douglas,
>>
>> If the match goes past the block end, we effectively extend the block 
>> by continuing (from the input buffer) to look for the end. If the 
>> match ends within 40 bytes of the Input buffer end, we encode the 
>> remaining bytes as literals.
>>
>> As to "< 40" or "<= 40", I'll need to double check how the 
>> conditional is checked.
>>
>> I take it you are still unable to upload to the workspace? If you 
>> could upload your compression (of a public source), I could run it 
>> through Windows decompression and let you know why it fails.
>>
>> Best regards,
>> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft 
>> Protocol Open Specifications Team
>> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: 
>> (UTC-08:00) Pacific Time (US and Canada) Local country phone number 
>> found here:
>> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsupp
>> ort.microsoft.com%2Fglobalenglish&data=05%7C01%7Cjeffm%40microsof
>> t.com%7C9d598b1d783e4917f58208dac6bf2818%7C72f988bf86f141af91ab2d7cd0
>> 11db47%7C1%7C0%7C638040821574609559%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
>> C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7
>> C%7C&sdata=M4r2RR%2FncISk98UCUYOLCf1neep6Nsmr6vUPGii9DJM%3D&r
>> eserved=0 | Extension 1138300
>>
>> -----Original Message-----
>> From: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
>> Sent: Sunday, November 13, 2022 4:06 PM
>> To: Jeff McCashland (He/him) <jeffm at microsoft.com>; 
>> cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) 
>> <scabrero at samba.org>
>> Cc: Microsoft Support <supportmail at microsoft.com>
>> Subject: Re: [cifs-protocol] [EXTERNAL] [MS-XCA] LZ77+ Huffman: 
>> questions about blocks - TrackingID#2210140040006030
>>
>> hi Jeff,
>>
>> What I have for the LZ77 phase, in pseudocode, is this:
>>
>>     set the block size to be the minimum of 65536 or the remaining size.
>>
>>     for each byte of the input up to block size - 3:
>>         work out whether there's a match, using hashes etc
>>         no match:
>>            add a literal
>>            continue from loop start
>>         match:
>>            add the match distance and length
>>            is the end of the match past the block end?
>>            no:
>>               advance to the byte after the match
>>               continue from loop start
>>            yes:
>>               is the end of the match within 40 bytes of the end of input?
>>               no:
>>                 we've finished this block
>>               yes:
>>                 set the block end to consume all the remaining input 
>> bytes
>>                 exit this loop
>>
>>     # here we start another loop that cleans up the remaining bytes,
>>     # either because we got to block length - 3 where there's no room
>>     # for matches, or because we extended the block to the end of 
>> input
>>
>>     for each byte of the input up to block size:
>>         add a literal
>>
>>
>> But I am able to generate files that won't decompress on Windows, and 
>> I don't know why. This, I *think*, is the only part that is still unclear to me.
>>
>> In all my failing cases, there is a match that ends after the block 
>> end but within 40 bytes of the end of the input, but there is no 
>> obvious pattern around the offsets (like, it doesn't look like an off-by-one >= vs > confusion).
>>
>> Anything you can add about how Windows compresses or decompresses in 
>> this situation would be very helpful.
>>
>> thanks
>> Douglas
>>
>>
>>
>>
>> On 10/11/22 13:16, Douglas Bagnall via cifs-protocol wrote:
>>>> If it ends within 40 bytes of the end of the buffer, the remaining 
>>>> bytes in the buffer are encoded as literals.
>>>
>>> Is this the case even when the would-be-penultimate block doesn't 
>>> end in an overly long match?
>>>
>>> That is, if the compressor began a block at the point where there 
>>> were 64k + 20 bytes remaining, would it always make the next block 
>>> contain all the data, or would it wait to see if it ended in an 
>>> overshooting match, and encode the remaining data in a new block if not?
>>>
>>> Also, when you say "within 40 bytes", is that "< 40" or "<= 40"?
>>>
>>> cheers,
>>> Douglas
>>>
>>>
>>> On 10/11/22 10:23, Jeff McCashland (He/him) wrote:
>>>> Hi Douglas,
>>>>
>>>> It appears you are correct that the current block ends at the end 
>>>> of the >64k match. The maximum length of the match will go to the 
>>>> end of the input buffer.
>>>> If it ends within 40 bytes of the end of the buffer, the remaining 
>>>> bytes in the buffer are encoded as literals.
>>>>
>>>> Best regards,
>>>> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft 
>>>> Protocol Open Specifications Team
>>>> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone:
>>>> (UTC-08:00) Pacific Time (US and Canada) Local country phone number 
>>>> found here:
>>>> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsu
>>>> pp
>>>> ort.microsoft.com%2Fglobalenglish&data=05%7C01%7Cjeffm%40micros
>>>> of
>>>> t.com%7C56ce4f44e4ae44cb55a508dac5d4014e%7C72f988bf86f141af91ab2d7c
>>>> d0
>>>> 11db47%7C1%7C0%7C638039811608167234%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
>>>> iM
>>>> C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C
>>>> %7
>>>> C%7C&sdata=fwMV%2FT%2FCAh70WBvUwC%2BYDvWYqP4yBGC%2F7rT0YbWNj44%
>>>> 3D
>>>> &reserved=0 | Extension 1138300
>>>>
>>>> -----Original Message-----
>>>> From: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
>>>> Sent: Tuesday, November 8, 2022 4:30 PM
>>>> To: Jeff McCashland (He/him) <jeffm at microsoft.com>; 
>>>> cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) 
>>>> <scabrero at samba.org>
>>>> Cc: Microsoft Support <supportmail at microsoft.com>
>>>> Subject: Re: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about 
>>>> blocks -
>>>> TrackingID#2210140040006030
>>>>
>>>> Another question on this:
>>>>
>>>> When a block exceeds 64k, how can we know when the block ends?
>>>>
>>>> Does it always end immediately after the match pushing past 64k?
>>>>
>>>> Douglas
>>>>
>>>>
>>>>
>>>> On 9/11/22 12:44, Jeff McCashland (He/him) wrote:
>>>>> Hi Douglas,
>>>>>
>>>>> I've found something interesting while researching this issue that might help.
>>>>>
>>>>> In the initial LZ77 encoding phase, matches are searched for in 
>>>>> 64k blocks, as documented. However, when determining the length of 
>>>>> the match, Windows will keep searching as long as the match 
>>>>> continues, even if it continues through multiple 64k blocks, up to 
>>>>> a total of
>>>>> 64 MB. I created a file > 64MB with lowercase a-z repeated so that 
>>>>> the first match actually goes to the end of the 64 MB 'super-block'.
>>>>> The length of the match worked out to just under
>>>>> 64 MB, and the next pass started with the remainder of the file after 64MB.
>>>>>
>>>>> I hope that helps to explain some of the oddities you're seeing.
>>>>>
>>>>> Best regards,
>>>>> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft 
>>>>> Protocol Open Specifications Team
>>>>> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone:
>>>>> (UTC-08:00) Pacific Time (US and Canada) Local country phone 
>>>>> number found here:
>>>>> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fs
>>>>> up
>>>>> po
>>>>> rt.microsoft.com%2Fglobalenglish&data=05%7C01%7Cjeffm%40microsoft.
>>>>> com%7C0483ab4790724d4204f608dac1e99e45%7C72f988bf86f141af91ab2d7cd
>>>>> 01
>>>>> 1d
>>>>> b47%7C1%7C0%7C638035506383210907%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
>>>>> C4
>>>>> wL
>>>>> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C
>>>>> %7
>>>>> C&
>>>>> amp;sdata=dzDCD7AzjOQ4QVG%2FOJNg9LB%2BYRwnzTCoXvjad6A79wY%3D&r
>>>>> es
>>>>> er
>>>>> ved=0 | Extension 1138300
>>>>>
>>>>> -----Original Message-----
>>>>> From: Jeff McCashland (He/him)
>>>>> Sent: Thursday, October 27, 2022 9:57 AM
>>>>> To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>;
>>>>> cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) 
>>>>> <scabrero at samba.org>
>>>>> Cc: Microsoft Support <supportmail at microsoft.com>
>>>>> Subject: RE: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about 
>>>>> blocks
>>>>> - TrackingID#2210140040006030
>>>>>
>>>>> Hi Douglas,
>>>>>
>>>>> Thank you for the fast response. I will continue digging into this.
>>>>>
>>>>> Best regards,
>>>>> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft 
>>>>> Protocol Open Specifications Team
>>>>> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone:
>>>>> (UTC-08:00) Pacific Time (US and Canada) Local country phone 
>>>>> number found here:
>>>>> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fs
>>>>> up
>>>>> po
>>>>> rt.microsoft.com%2Fglobalenglish&data=05%7C01%7Cjeffm%40microsoft.
>>>>> com%7C0483ab4790724d4204f608dac1e99e45%7C72f988bf86f141af91ab2d7cd
>>>>> 01
>>>>> 1d
>>>>> b47%7C1%7C0%7C638035506383210907%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
>>>>> C4
>>>>> wL
>>>>> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C
>>>>> %7
>>>>> C&
>>>>> amp;sdata=dzDCD7AzjOQ4QVG%2FOJNg9LB%2BYRwnzTCoXvjad6A79wY%3D&r
>>>>> es
>>>>> er
>>>>> ved=0 | Extension 1138300
>>>>>
>>>>> -----Original Message-----
>>>>> From: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
>>>>> Sent: Wednesday, October 26, 2022 2:10 PM
>>>>> To: Jeff McCashland (He/him) <jeffm at microsoft.com>; 
>>>>> cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) 
>>>>> <scabrero at samba.org>
>>>>> Cc: Microsoft Support <supportmail at microsoft.com>
>>>>> Subject: Re: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about 
>>>>> blocks
>>>>> - TrackingID#2210140040006030
>>>>>
>>>>> hi Jeff,
>>>>>
>>>>> Thanks. Yes, I think you're understanding correctly, and that is a 
>>>>> valid answer, and I *would* have happily accepted it, but in the 
>>>>> meantime I have had the misfortune of re-reading MS-XCA again and 
>>>>> again, and I now believe it contradicts this view.
>>>>>
>>>>> In 2.1.4.3 there is:
>>>>>
>>>>>       The following pseudocode demonstrates the encoding method.
>>>>>
>>>>>            Write the 256-byte table of symbol bit lengths
>>>>>            While there are more literals or matches to encode
>>>>>               [[write bits per the algorithm, not in question 
>>>>> here]]
>>>>>            WriteBits(SymbolLength[256], SymbolCode[256])
>>>>>            FlushBits()
>>>>>
>>>>> This appears to be encoding a single block (there's one 256-byte 
>>>>> table), and it ends with the FlushBits(), which is essentially the 
>>>>> "ignore ghi..." in my example. However it also has a 
>>>>> "WriteBits(SymbolLength[256], SymbolCode[256])", which I 
>>>>> understand should only happen at the end of the last block.
>>>>>
>>>>> I think it would be accurate to say this pseudocode "demonstrates 
>>>>> the encoding method for a message of 65536 or fewer bytes", but is 
>>>>> unclear for multi-block messages.
>>>>>
>>>>>
>>>>> And in section 2.2.4 the main decompression pseudocode loop starts like:
>>>>>
>>>>>       Loop until a decompression terminating condition
>>>>>           Build the decoding table
>>>>>           CurrentPosition = 256     // start at the end of the 
>>>>> Huffman table
>>>>>           NextBits = Read16Bits(InputBuffer + CurrentPosition)
>>>>>           CurrentPosition += 2
>>>>>           NextBits <<= 16
>>>>>           NextBits |= Read16Bits(InputBuffer + CurrentPosition)
>>>>>           CurrentPosition += 2
>>>>>           ExtraBitCount = 16
>>>>>
>>>>>
>>>>> which suggests that the bits "ghi..." are discarded because we are 
>>>>> told implicitly in the text that Read16bits shifts input into a 32 
>>>>> bit register -- if we call it twice at the beginning of each 
>>>>> block, whatever was in the register has to fall out the other end.
>>>>>
>>>>> cheers,
>>>>> Douglas
>>>>>
>>>>>
>>>>> On 27/10/22 05:48, Jeff McCashland (He/him) wrote:
>>>>>> Hi Douglas,
>>>>>>
>>>>>> As I understand, each 64k block is processed separately. In other 
>>>>>> words, the first 64k block is LZ77 compressed, then the Huffman 
>>>>>> codes are constructed based on symbol frequency in that 64k. If, 
>>>>>> in your example, DEF ends the 64k block, then the subsequent ghi...
>>>>>> will be processed with the second 64k block and Huffman table, 
>>>>>> and not dropped.
>>>>>>
>>>>>> Am I understanding your question correctly?
>>>>>>
>>>>>> Best regards,
>>>>>> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft 
>>>>>> Protocol Open Specifications Team
>>>>>> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone:
>>>>>> (UTC-08:00) Pacific Time (US and Canada) Local country phone 
>>>>>> number found here:
>>>>>> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2F
>>>>>> su
>>>>>> pp
>>>>>> o
>>>>>> rt.microsoft.com%2Fglobalenglish&data=05%7C01%7Cjeffm%40microsoft.
>>>>>> com%7C2eed4b9f9fda44b7458d08dab79674a8%7C72f988bf86f141af91ab2d7c
>>>>>> d0
>>>>>> 11
>>>>>> d
>>>>>> b47%7C1%7C0%7C638024154089025057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
>>>>>> MC
>>>>>> 4w
>>>>>> L
>>>>>> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7
>>>>>> C%
>>>>>> 7C
>>>>>> &
>>>>>> amp;sdata=gToAFtZwgAnmsIJgYkP9aVhup%2Blb5zsDN%2Bajebbzh18%3D&
>>>>>> re
>>>>>> se
>>>>>> r
>>>>>> ved=0 | Extension 1138300
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Jeff McCashland (He/him)
>>>>>> Sent: Friday, October 14, 2022 1:58 PM
>>>>>> To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>;
>>>>>> cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) 
>>>>>> <scabrero at samba.org>
>>>>>> Cc: Microsoft Support <supportmail at microsoft.com>
>>>>>> Subject: RE: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about 
>>>>>> blocks
>>>>>> - TrackingID#2210140040006030
>>>>>>
>>>>>> [Tom to BCC]
>>>>>>
>>>>>> Hi Douglas,
>>>>>>
>>>>>> I'll research this question and let you know what I learn.
>>>>>>
>>>>>> Best regards,
>>>>>> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft 
>>>>>> Protocol Open Specifications Team
>>>>>> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone:
>>>>>> (UTC-08:00) Pacific Time (US and Canada) Local country phone 
>>>>>> number found here:
>>>>>> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2F
>>>>>> su
>>>>>> pp
>>>>>> o
>>>>>> rt.microsoft.com%2Fglobalenglish&data=05%7C01%7Cjeffm%40microsoft.
>>>>>> com%7C2eed4b9f9fda44b7458d08dab79674a8%7C72f988bf86f141af91ab2d7c
>>>>>> d0
>>>>>> 11
>>>>>> d
>>>>>> b47%7C1%7C0%7C638024154089025057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
>>>>>> MC
>>>>>> 4w
>>>>>> L
>>>>>> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7
>>>>>> C%
>>>>>> 7C
>>>>>> &
>>>>>> amp;sdata=gToAFtZwgAnmsIJgYkP9aVhup%2Blb5zsDN%2Bajebbzh18%3D&
>>>>>> re
>>>>>> se
>>>>>> r
>>>>>> ved=0 | Extension 1138300
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Tom Jebo <tomjebo at microsoft.com>
>>>>>> Sent: Friday, October 14, 2022 9:45 AM
>>>>>> To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>;
>>>>>> cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) 
>>>>>> <scabrero at samba.org>
>>>>>> Cc: Microsoft Support <supportmail at microsoft.com>
>>>>>> Subject: RE: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about 
>>>>>> blocks
>>>>>> - TrackingID#2210140040006030
>>>>>>
>>>>>> [dochelp to bcc]
>>>>>> [casemail cc]
>>>>>>
>>>>>> Hi Douglas,
>>>>>>
>>>>>> Thank you for your request. One of the Open Specifications team 
>>>>>> will respond to start working with you. I have created case
>>>>>> 2210140040006030 and added the number to the subject of this email.
>>>>>> Please refer to this case number in future communications 
>>>>>> regarding this issue.
>>>>>>
>>>>>> Best regards,
>>>>>> Tom Jebo
>>>>>> Sr Escalation Engineer
>>>>>> Microsoft Open Specifications
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
>>>>>> Sent: Thursday, October 13, 2022 9:57 PM
>>>>>> To: Interoperability Documentation Help <dochelp at microsoft.com>; 
>>>>>> cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) 
>>>>>> <scabrero at samba.org>
>>>>>> Subject: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about 
>>>>>> blocks
>>>>>>
>>>>>> hi Dochelp,
>>>>>>
>>>>>>
>>>>>> Does the beginning of the second and subsequent blocks break the 
>>>>>> bitstream, starting again at a byte boundary after the new Huffman table?
>>>>>>
>>>>>> The question is best explained by analogy to the way long lengths 
>>>>>> are handled in matches. Suppose we have a match symbol in the 
>>>>>> middle of a bitstream, and the match is a long one, requiring the 
>>>>>> reading of an extra byte:
>>>>>>
>>>>>>         ijklmnop  abcDEFgh [distance] qrs...
>>>>>>                      |
>>>>>>                      [match 1, 15]
>>>>>>
>>>>>> Here abc, ghi.. are the sequence of bits in the stream around the 
>>>>>> match DEF, which is read in alternating bytes by little-endian 
>>>>>> rules, and the distance is plonked in the middle of the stream as 
>>>>>> an individual byte. The stream just flows around it, so 
>>>>>> gh-ijklmnop are interpreted after [distance].
>>>>>>
>>>>>> Now, if DEF instead ended the block:
>>>>>>
>>>>>>         ijklmnop  abcDEFgh [new Huffman table] qrs...
>>>>>>                      |
>>>>>>                      [ends the block (64k)]
>>>>>>
>>>>>>
>>>>>> would the bits gh-jklmnop be interpreted using the new Huffman 
>>>>>> table, as part of the new block, or would those bits be dropped?
>>>>>>
>>>>>> Multi-block examples would of course be helpful.
>>>>>>
>>>>>>
>>>>>> Douglas
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> cifs-protocol mailing list
>>> cifs-protocol at lists.samba.org
>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
>>> st
>>> s.samba.org%2Fmailman%2Flistinfo%2Fcifs-protocol&data=05%7C01%7C
>>> je
>>> ffm%40microsoft.com%7C56ce4f44e4ae44cb55a508dac5d4014e%7C72f988bf86f
>>> 14
>>> 1af91ab2d7cd011db47%7C1%7C0%7C638039811608323446%7CUnknown%7CTWFpbGZ
>>> sb
>>> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
>>> D%
>>> 7C3000%7C%7C%7C&sdata=nnwkDqnOYlGkLdFmRyNzCkKCsmu9DRrxQoyFJFymHI
>>> Y%
>>> 3D&reserved=0
>>
> 
> 
> _______________________________________________
> cifs-protocol mailing list
> cifs-protocol at lists.samba.org
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> s.samba.org%2Fmailman%2Flistinfo%2Fcifs-protocol&data=05%7C01%7Cje
> ffm%40microsoft.com%7C9d598b1d783e4917f58208dac6bf2818%7C72f988bf86f14
> 1af91ab2d7cd011db47%7C1%7C0%7C638040821574609559%7CUnknown%7CTWFpbGZsb
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C3000%7C%7C%7C&sdata=4tCrYCG6cWPPzrkb1mUW9k%2BQl8ycHpO7pCIEQ6b8oj
> 8%3D&reserved=0




More information about the cifs-protocol mailing list