[cifs-protocol] [EXTERNAL] [MS-XCA] LZ77+ Huffman example 1 - TrackingID#2210140040005999

Jeff McCashland (He/him) jeffm at microsoft.com
Tue Nov 15 21:07:55 UTC 2022


Hi Douglas,

One more pass at this one, with some nuance. If we were to document this, I believe it would look something like this at the beginning of the pseudocode:

2.2 LZ77+Huffman Decompression Algorithm Details
2.2.4 Processing

	Loop until a decompression terminating condition   
Add:	If remaining buffer does not have enough space for a Huffman table
		If we're at the end of the output buffer
			Decompression is complete, return with success
		The compressed data is not valid. Return with error.
 	Build the decoding table
 	CurrentPosition = 256              // start at the end of the Huffman table
	[...]

In your opinion, is this an optimization/implementation detail, or necessary to document as above? 

Best regards,
Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft Protocol Open Specifications Team 
Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: (UTC-08:00) Pacific Time (US and Canada)
Local country phone number found here: http://support.microsoft.com/globalenglish | Extension 1138300

-----Original Message-----
From: Jeff McCashland (He/him) 
Sent: Monday, November 14, 2022 1:04 PM
To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>; cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) <scabrero at samba.org>
Cc: Microsoft Support <supportmail at microsoft.com>
Subject: RE: [EXTERNAL] [MS-XCA] LZ77+ Huffman example 1 - TrackingID#2210140040005999

Ok, Thanks Douglas. I look forward to hearing what you come up with. 

Best regards,
Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft Protocol Open Specifications Team
Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: (UTC-08:00) Pacific Time (US and Canada) Local country phone number found here: http://support.microsoft.com/globalenglish | Extension 1138300

-----Original Message-----
From: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Sent: Friday, November 11, 2022 2:42 PM
To: Jeff McCashland (He/him) <jeffm at microsoft.com>; cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) <scabrero at samba.org>
Cc: Microsoft Support <supportmail at microsoft.com>
Subject: Re: [EXTERNAL] [MS-XCA] LZ77+ Huffman example 1 - TrackingID#2210140040005999

hi Jeff,

It will do an an answer, though I don't think it is entirely correct.
The encoding of EOF according to the Huffman table are the bits 0100 which is there in the 048d word before the zeroes. 0000 would encode 'w', I think. But, yes, the size tells us we're finished.

While it's OK to have an arbitrary number of extra bytes at the end of the the message, the situation is a whole lot more precarious at the end of a block inside a larger message, where there is a need to know exactly how many zeroes to step over, or the next Huffman block will be read from the wrong place (I know that's not quite the same question).

But don't spend any more time on this one. I am closing in on the answer via trial and error and may get back to you with suggestions for MS-XCA.

thanks,
Douglas


On 12/11/22 11:04, Jeff McCashland (He/him) wrote:
> Hi Douglas,
> 
> The 4 zero bytes are the encoding of the EOF. We don't actually decode these bytes. Once we decode the final literal 'z', we realize we've reached the original uncompressed size, and stop processing. This is why it's essential to pass in the correct size of the original uncompressed buffer.
> 
> Let me know if that doesn't fully answer your question.
> 
> Best regards,
> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft 
> Protocol Open Specifications Team
> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: 
> (UTC-08:00) Pacific Time (US and Canada) Local country phone number 
> found here:
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsuppo
> rt.microsoft.com%2Fglobalenglish&data=05%7C01%7Cjeffm%40microsoft.
> com%7C1edd50b3f0b44c8db3fb08dac435e6b4%7C72f988bf86f141af91ab2d7cd011d
> b47%7C1%7C0%7C638038033063463462%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
> amp;sdata=%2B8Kg5XhHodVgvSIaHHpkv%2FUDJ479mgOD1dUVI121Ojw%3D&reser
> ved=0 | Extension 1138300
> 
> -----Original Message-----
> From: Jeff McCashland (He/him)
> Sent: Thursday, October 27, 2022 1:18 PM
> To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>;
> cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) 
> <scabrero at samba.org>
> Cc: Microsoft Support <supportmail at microsoft.com>
> Subject: RE: [EXTERNAL] [MS-XCA] LZ77+ Huffman example 1 -
> TrackingID#2210140040005999
> 
> Hi Douglas,
> 
> Thank you for the update. I'll look into that aspect.
> 
> Best regards,
> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft 
> Protocol Open Specifications Team
> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: 
> (UTC-08:00) Pacific Time (US and Canada) Local country phone number 
> found here:
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsuppo
> rt.microsoft.com%2Fglobalenglish&data=05%7C01%7Cjeffm%40microsoft.
> com%7C1edd50b3f0b44c8db3fb08dac435e6b4%7C72f988bf86f141af91ab2d7cd011d
> b47%7C1%7C0%7C638038033063619689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
> amp;sdata=8B5w17A6tXD0cerKgk%2FqbMBvj%2Fw9xKvAxqZRhZK3%2FUc%3D&res
> erved=0 | Extension 1138300
> 
> -----Original Message-----
> From: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
> Sent: Thursday, October 27, 2022 1:09 PM
> To: Jeff McCashland (He/him) <jeffm at microsoft.com>; 
> cifs-protocol at lists.samba.org; Samuel Cabrero (Samba) 
> <scabrero at samba.org>
> Cc: Microsoft Support <supportmail at microsoft.com>
> Subject: Re: [EXTERNAL] [MS-XCA] LZ77+ Huffman example 1 -
> TrackingID#2210140040005999
> 
> hi Jeff,
> 
> I would like to amend his question slightly if I may.
> 
>> In the first example in Section 3.2, where 
>> "abcdefghijklmnopqrstuvwxyz" is "compressed" into a ~282 byte 
>> sequence ending with
>>
>> d8 52 3e d7 94 11 5b e9 19 5f f9 d6 7c df 8d 04 00 00 00 00
>>
>> where do all the trailing zeros come from?
>>
>> They do not encode characters, and from the decoding description in 
>> 2.2.4, we don't read 32 bits at a time except at the start of the 
>> first block, so processing should be well finished before we get to 
>> read these. It seems to violate the "input buffer is finished" termination rule.
>>
> 
> I now think I understand how the zeroes are produced, based on the interactions between  OutputPosition1, OutputPosition2, and OutputPosition in the 2.1.4.3 pseudocode. I can reproduce the result.
> 
> But I still can't understand how they are consumed, based on the terminating conditions in the decoding phase (2.2.4). So the "where do they come from" part of the question is answered, but the implicit "how are they read" part is not.
> 
> Douglas




More information about the cifs-protocol mailing list