[cifs-protocol] [REG:119070521001876] SMB3 LZ77 decompression issues

Edgar Olougouna edgaro at microsoft.com
Fri Jul 5 19:02:01 UTC 2019


Aurélien,
I will take a look at this and follow-up. If you apply the change Metze suggested to the pseudo-code, does it allow you to decompress the payload?

Thanks,
Edgar

-----Original Message-----
From: Bryan Burgin <bburgin at microsoft.com> 
Sent: Friday, July 5, 2019 1:23 PM
To: Aurélien Aptel <aaptel at suse.com>; Interoperability Documentation Help <dochelp at microsoft.com>; cifs-protocol at lists.samba.org
Cc: support <support at mail.support.microsoft.com>
Subject: [REG:119070521001876] SMB3 LZ77 decompression issues

Hi Aurélien,

Thank you for your question.  We created SR 119070521001876 to track your issue.  An engineer will contact you soon.

Bryan

-----Original Message-----
From: Aurélien Aptel <aaptel at suse.com>
Sent: Friday, July 5, 2019 8:03 AM
To: Interoperability Documentation Help <dochelp at microsoft.com>; cifs-protocol at lists.samba.org
Subject: SMB3 LZ77 decompression issues

Hello,

I'm posting again with dochelp in CC.

I've been able to trigger a LZ77 compressed SMB3 Read response against the latest Windows Server 2019 but I am unable to decompress it.

Request
=======

SMB2 (Server Message Block Protocol version 2)
    [....]
    Read Request (0x08)
StructureSize: 0x0031
    0000 0000 0011 000. = Fixed Part Length: 24
    .... .... .... ...1 = Dynamic Part: True
Padding: 0x00
Flags: 0x02, Compressed
    .... ...0 = Unbuffered: Client is NOT asking for UNBUFFERED read
    .... ..1. = Compressed: Client is asking for COMPRESSED data Read Length: 131072 File Offset: 0 GUID handle File: a
    File Id: 00000012-0004-0000-0100-000004000000
    [Frame handle opened: 52]
Min Count: 0
Channel: None (0x00000000)
Remaining Bytes: 0
Blob Offset: 0x00000000
Blob Length: 0
Channel Info Blob: NO DATA

Response
========

0000  fc 53 4d 42 00 00 02 00  02 00 00 00 50 00 00 00   .SMB.... ....P...
0010  fe 53 4d 42 40 00 02 00  00 00 00 00 08 00 0a 00   .SMB at ... ........
0020  01 00 00 00 00 00 00 00  07 00 00 00 00 00 00 00   ........ ........
0030  ff fe 00 00 01 00 00 00  35 00 00 00 00 10 00 00   ........ 5.......
0040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ........ ........
0050  11 00 50 00 00 00 02 00  00 00 00 00 00 00 00 00   ..P..... ........
0060  ff ff ff 7f ff 07 00 0f  ff 00 00 fc ff 01 00      ........ .......

NetBIOS Session Service
    Message Type: Session message (0x00)
    Length: 111
SMB2 (Server Message Block Protocol version 2)
    SMB2 Compression Transform Header
    ProtocolId: fc534d42
    OriginalSize: 131072
    CompressionAlgorithm: LZ77 (0x0002)
    Reserved: 0000
    Offset: 0x00000050

Let's look again and annotate...

0000  fc 53 4d 42 00 00 02 00  02 00 00 00 50 00 00 00   .SMB.... ....P...
      ^^^^^^^^^^^                          ^^^^^^^^^^^
 compression transform header            compressed data offset = 0x50

   SMB2 header follows                     READ
      vvvvvvvvvvv                          vvvvv
0010  fe 53 4d 42 40 00 02 00  00 00 00 00 08 00 0a 00   .SMB at ... ........
0020  01 00 00 00 00 00 00 00  07 00 00 00 00 00 00 00   ........ ........
0030  ff fe 00 00 01 00 00 00  35 00 00 00 00 10 00 00   ........ 5.......
0040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ........ ........
0050  11 00 50 00 00 00 02 00  00 00 00 00 00 00 00 00   ..P..... ........
    ^^
  read data offset from SMB2 header is 0x50 again

0060  ff ff ff 7f ff 07 00 0f  ff 00 00 fc ff 01 00      ........ .......
      ^^
    compressed data starts here (0x10 + 0x50 = 0x60)

So the LZ77 compressed data is

    ff ff ff 7f ff 07 00 0f ff 00 00 fc ff 01 00

I've tried to decode it using [MS-XCA] 2.4.4 "Plain LZ77 Decompression"
[1] which has pseudo code that is easily runnable in python. I can decode the examples on that page fine:

  >>> decode(bytes.fromhex(" ff ff ff 1f 61 62 63 17 00 0f ff 26 01"))
  bytearray(b'abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc'+
    b'abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc'+
    b'abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc'+
    b'abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc'+
    b'abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc')

But if I try to decode my compressed payload it is invalid:

  >>> decode(bytes.fromhex(" ff ff ff 7f ff 07 00 0f  ff 00 00 fc ff 01 00"))
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "lz.py", line 54, in decode
      raise Exception("error")

This corresponds to this line in the pseudo-code:

     If MatchLength < 15 + 7
Return error.

And it fails in the very beggining after only outputting 1 byte (ff). The uncompressed payload should be all 0xFF.

Stefan Metzmacher found that there is a bug in the pseudo-code when dealing with long matches:

"Stefan Metzmacher" <metze at samba.org> writes:
> It seems the compression algorithm has a bug regarding matches longer 
> than UINT16_MAX + 3.
>
> In your example we an original payload of 131072 bytes with 0xff.
>
> 1. The first byte is encoded directly.
>
> 2. We find a match with offset 1 and length 131071
>
> 3. We do offset -= 1 and length -= 3 (we have offset=0, length =
> 131068)
>
> 4. Length is >= 7, we do length -= 7 and encode it (=> length =
> 131061)
>
> 5. length is >= 15, we do length -=15 and encode it (=> length =
> 131046)
>
> 6. length is >= 255, we do length += (15 + 7)
>    (=> length = 131068 (0x1FFFC) again)
>    Encoding this into just 2 bytes doesn't work.
>
>    Ah! It seems the 0x0000 length means the length is encoded in the
>    following 3 bytes! fc ff 01 is just 131068

It is actually the following 4 bytes.

So this change was needed in the pseudo-code from MS-XNA:

   --- lz77decompress-example1a.py 2019-07-05 15:08:16.145761364 +0200
   +++ lz77decompress-example1b.py 2019-07-05 15:40:20.824646872 +0200
   @@ -81,6 +81,10 @@ def decode(ibuf):
    # read 2 bytes from InputPosition
    MatchLength = struct.unpack_from('<H', ibuf, InputPosition)[0]
    InputPosition += 2
   +                if MatchLength == 0:
   +                    # read 4 bytes from InputPosition
   +                    MatchLength = struct.unpack_from('<I', ibuf, InputPosition)[0]
   +                    InputPosition += 4


Can Microsoft confirm the pseudo-code is now complete?

Cheers,
--
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3 SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg)




More information about the cifs-protocol mailing list