SMB2 Performance Regression for Huge Numbers of Small Files: Excessive Find and Other Requests

awl1 awl1 at mnet-online.de
Tue Oct 31 16:11:51 UTC 2017


Hello Ralph, Jeremy, Andrew, hello fellow Samba experts to whom it may 
concern,

first of all, I'm sorry that it took such a long time for me to complete 
the full preparation of a reproducible test scenario, packet trace 
recording and analysis. Unfortunately, I have been fighting with ongoing 
health issues which kept me from making progess as intended...

This is the follow-up for the previous thread "Windows SMB2 client doing 
excessive, inefficient SMB2 Find (and other) requests" (from 
mid-September) here on samba-technical (and a number of earlier threads 
on the samba-user list since late July):
https://lists.samba.org/archive/samba-technical/2017-September/123046.html
https://lists.samba.org/archive/samba-technical/2017-September/123082.html


But finally, here you are now, and the ABSTRACT ("management 
summary"...) of my issue report is:

There is a SEVERE PERFORMANCE REGRESSION between SMB (SMB 1.5) and SMB2 
(SMB 3.11) performance when looking at a scenario where a Windows 10 
client copies a HUGE NUMBER OF SMALL FILES from or to a SMB2 share 
drive, regardless of whether this share is provided through Samba (share 
being hosted by any SMB2 capable version, tested with Samba 4.7.0) or 
Windows 10 itself (share being hosted by Win 10 Pro).

Writing 2000 files to a share slows down by a factor of between 2 and 8 
depending on the particular scenario (best SMB1 result: 20 sec, typical 
SMB2 result: 90 sec, worst SMB2 result: 164 sec), and reading slows down 
by a factor between 1.5 and 4 (best SMB1 result: 15 sec, typical/worst 
SMB2 result: 60 seconds).

Also, results vary hugely depending on the type of client used to 
initiate the copy process: For the write to share scenario, Windows 
Explorer using SMB2 gives notably faster results than "xcopy /s", but 
both these SMB2-based results are hugely slower than the results of the 
respective tools using SMB1 to access the same share! For the read 
scenario, it rather is the opposite: Here, "xcopy /s" using SMB2 even 
turns out to be the fastest scenario (even slightly faster than "xcopy 
/s" using SMB1), but reading 2000 files through Windows Explorer in the 
exact same scenario is about three times slower using SMB2 than it used 
to be when using SMB1...

My Wireshark packet tracing has uncovered that the root cause for this 
seems NOT to be a server-side issue (neither in Samba nor the Windows 
SMB2 service), but rather widely varying and hugely inefficient 
communication by the Windows SMB2 client implementation (when compared 
to both the Windows SMB1 client or a Linux SMB2 client doing the same 
thing). "Hugely inefficient" refers to two main issues here (please find 
more details further down):

a) Excessive, inefficient communication consisting of calls that are 
repeated multiple times (seemingly without plausible reason/need): To 
copy 2000 files, I would expect ~ 2000 calls of each SMB2 operation type 
(like Find, GetInfo, Create, Close), but we see up to 10000 such 
requests of many SMB2 operation types (i.e. up to 5 times as many as 
needed) without reason. This wastes both time (to execute the operation 
on the SMB2 server, whether Samba or Windows) and network bandwith (to 
communicate SMB2 requests/responses).

b) Inefficient use of the FIND_ID_BOTH_DIRECTORY_INFO operation 
(smb2.find.infolevel == 37) (in the Write scenario only!), which is 
repeatedly being used with its "Pattern" parameter set to "*" without 
plausible reason. The issue here is that with every subsequent file that 
has been successfully copied to the share, the 
FIND_ID_BOTH_DIRECTORY_INFO response grows in size, as it re-lists all 
files that have been successfully copied during previous iterations. 
Again, this wastes both network bandwith and server-side execution time.


TEST CASE / REPRODUCER SETUP:

As proposed by Ralph, my reproducer test scenario is as simple as 
possible and consists of a single directory "TestDir" containing 2000 
empty files (of length 0 - so the smallest files possible...) named 
"emptyfile0000.000" to "emptyfile2000.000".

Client machine is always my home office workstation running either 
Windows 10 Pro 64-bit (with most recent "Fall Creator's Update", fully 
patched to current level) or Ubuntu Linux LTS 16.04 (fully patched to 
current level).

SMB1/SMB2 Share is served either by my home office NAS (Thecus 
N4200Pro), either running the Thecus-provided Samba 3.6.15 (SMB1 version 
1.5) or a self-compiled Samba 4.7.0 (SMB2 version 3.11; including all 
Samba compile dependencies in most recent versions), or by another 
workstation PC in my home office running Windows 10 Pro (SMB2 version 
3.11; using most recent "Fall Creator's Update", fully patched to 
current level).

Global section of my Samba smb.conf on the Thecus NAS has been hugely 
stripped down and is identical for Samba 3.6.15 and 4.7.0 (note that I 
have already switched on Samba's case sensitivity option in order to 
speed up handling of many files):

    [global]
    log file = /var/log/samba/samba.%m
    max log size = 50
    log level = 1
    lock directory = /var/samba
    case sensitive = true
    default case = lower
    preserve case = yes
    short preserve case = yes
    security = user
    guest account = nobody
    map to guest = Bad User
    workgroup = WORKGROUP
    netbios name = N4200PRO


I have recorded Wireshark packet traces in pcapng format for the 
following scenarios (AFAICT, not disclosing any really private details 
from my home office network any more):

WRITE TO SHARE:
W1) copy TestDir with 2000 files from local file system on a Win 10 SMB1 
client (using both Explorer and command-line "xcopy /s") to SMB1 3.6.15 
server share
W2) copy from Win 10 SMB2 client (Explorer, "xcopy /s") to SMB2 4.7.0 
server share
W3) copy from Win 10 SMB2 client (Explorer, "xcopy /s") to Win 10 SMB2 
server share
W4) copy from Linux SMB1 client (using both krusader and command-line 
"cp -r") to SMB1 3.6.15 server share
W5) copy from Linux SMB2 client (krusader, "cp -r") to SMB2 4.7.0 server 
share
W6) copy from Linux SMB2 client (krusader, "cp -r") to Win 10 SMB2 
server share

READ FROM SHARE:
R1) copy TestDir with 2000 files from SMB1 3.6.15 server share to local 
file system on a Win 10 SMB1 client (Explorer, "xcopy /s")
R2) copy from SMB2 4.7.0 server share to Win 10 SMB2 client (Explorer, 
"xcopy /s")
R3) copy from Win 10 SMB2 server share to Win 10 SMB2 client (Explorer, 
"xcopy /s")
R4) copy from SMB1 3.6.15 server share to Linux SMB1 client (krusader, 
"cp -r")
R5) copy from SMB2 4.7.0 server share to Linux SMB2 client (krusader, 
"cp -r")
R6) copy from Win 10 SMB2 server share to Linux SMB2 client (krusader, 
"cp -r")

The above packet trace files are provided here:
http://home.mnet-online.de/awl1/write_to_share.zip (containing W1) to W6))
http://home.mnet-online.de/awl1/read_from_share.zip (containing R1) to R6))


As I am unable to attach binary files to this mail, I have provided the 
detailed results of my trace file analysis using an Excel sheet here:
http://home.mnet-online.de/awl1/Inefficient%20Windows%20SMB2%20Client.xls
http://home.mnet-online.de/awl1/Inefficient%20Windows%20SMB2%20Client.pdf

Basically, background color "yellow" in this sheet means 
"average/acceptable", "red" means "poor" (very inefficient), and "green" 
means "good", i.e. represents target performance that is proven to be 
possible when SMB2 communication is efficient (rather than hugely 
suboptimal).


DETAILS about above mentioned ISSUES a) and b):

a) Multiple, identical, repeated calls to SMB2 operations per file for 
(seemingly) no reason:

Please have a look at the fourth and fifth column of the Excel sheet 
where I have listed the numbers and SMB/SMB2 operation types from the 
packet traces. It turns out that in almost all of the really slow 
scenarios, we see a huge overhead of multiple, repeated calls to SMB2 
operations for no reason that would be plausible (at least to me): When 
copying a single directory with 2000 empty files, why in the world 
should this require e.g. (as in the W2 scenario with an "xcopy /s" client)

* ~ 5500 SMB2 Find operations, of which ~ 500 
FIND_ID_BOTH_DIRECTORY_INFO and ~ 5000 FIND_NAME_INFO
* ~ 6000 SMB2 SetInfo operations
* ~ 15500 SMB2 Create operations and
* ~ 15500 SMB2 Close operations

summing up to an execution time of ~ 165 seconds (when the same thing 
can be done against the exact same SMB2 server from a Linux SMB2 client 
without redundant operations in ~ 21 seconds)? I don't know what might 
cause this huge level of redundancy in the Windows SMB2 client 
implementation, I can only see its detrimental influence on performance...


b) Inefficient use of the FIND_ID_BOTH_DIRECTORY_INFO operation (in 
mainstream "Write" scenarios W2, W3 only):

Why does the SMB2 Find request of type SMB2_FIND_ID_BOTH_DIRECTORY_INFO 
(smb2.find.infolevel == 37) always use a wildcard Pattern "*"? This 
seems completely unnecessary. While it might be needed to check that a 
file with the same name (case [in]sensitive depending on Samba 
parameters) is not already existing, it clearly is not necessary to 
enumerate all files in the current directory, which is what Pattern "*" 
causes in my testing:

 From each iteration to the next, having copied one more file to the 
target share, the Find Response grows in size, i.e.

Find Response (0x0e)
     [Info Level: SMB2_FIND_ID_BOTH_DIRECTORY_INFO (37)]
     StructureSize: 0x0009
     Info: 7000000000000000a0f5d991fe1bd3010070d1ff911bd301...
         Offset: 0x00000048
         Length: 1120
         FileIdBothDirectoryInfo: .
         FileIdBothDirectoryInfo: ..
         FileIdBothDirectoryInfo: emptyfile0001.000
         FileIdBothDirectoryInfo: emptyfile0002.000
(...)
         FileIdBothDirectoryInfo: emptyfile<n>.000

i.e. the size of the Find Response grows with every single file 
successfully copied onto the share, and the current Find Response always 
contains the names of all the n (with n running between 0 and 2000) 
files that have been successully copied to the share so far.

This results in a much larger trace file: In my testing, the trace file 
size for copying 2000 files from a Win10 machine with the buggy client 
to a Samba server is ~ 30 MB and contains no less than ~ 2500 Find 
requests, while using a Linux client in the exact same scenario to copy 
those 2000 files onto the same share, the session trace file is less 
than 5 MB in size and contains at most four (4) Find requests (!!!).

Needless to say that of course, this use of the "*" pattern is also 
detrimental to both performance and network throughput...


Finally, my request to you:

Can the Samba team please look into the spreadsheet (and trace data) as 
provided, confirm that you are able to reproduce the poor performance 
and my analysis results, and finally make your peers at Microsoft aware 
of these Windows SMB2 client issues? I'll be happy to provide any 
further information, packet traces or other things you would like to see 
for your assessment.

I would hope that when you address these issues with the Microsoft Samba 
team, the chance of seeing these inefficiencies fixed in subsequent 
Windows Updates is much better than when I try to do the same as a 
single home office user...


Thanks a million one more time for your kind help with this & best regards
Andreas




More information about the samba-technical mailing list