[Samba] Third Try: Huge number of small files performance regression from 3.5.16 to 4.6.5 with identical smb.conf

awl1 awl1 at mnet-online.de
Fri Jul 14 15:37:11 UTC 2017

Hello again, Jeremy, hello again, Samba experts/developers,

as "all good things come in threes" and "third time is a charm", 
following kind advice from Björn Jacke, I do indeed try again on this 
list to arouse your interest one more time, giving an even shorter 
summary of the issue - and having tested with a number of older Samba 
versions between 3.5.x and 4.6.x to exactly pinpoint when the issue 

As I am 99.99% confident that this is not a configuration issue on my 
side, I would really appreciate if somebody from the Samba team would be 
interested in tracking down why - for the specific scenario with a huge 
number of small files - performance is (so) much worse with Samba 
4.x/SMB2 than it used to be with Samba 3.x/SMB1.

(Please note that, for a small number of larger or even huge files, as 
expected, I can also confirm from my observations that Samba 4.x/SMB2 is 
typically faster than Samba 3.x/SMB1, sometimes even considerably, so 
the issue is NOT with Samba 4.x/SMB2 in general, but seems to be caused 
to the specific scenario of a huge number of small files.)


    * Win10 client using TotalCommander 9.0a to copy files
    * Copying files from/to a Samba share running on my Home Office
      Thecus NAS
    * Thecus N4200pro NAS (Intel(R) Atom(TM) CPU D525, 2 cores/4 HT
      threads @ 1.80GHz, Linux kernel 2.6.33, 3 GB RAM) and either
      Thecus original Samba 3.5.16 or several self-compiled
      (using gcc-5.2) Samba versions:
        - Samba 4.6.5, SMB2 dialect 3.1.1
        - Samba 4.2.14, SMB2 dialect 3.0
        - Samba 4.0.26, SMB2 dialect 3.0
        - Samba 3.6.25, SMB2 dialect 2.0.2 (single line
          "min protocol = SMB2" added to smb.conf)
        - Samba 3.6.25, SMB1 dialect 1.5
        - Thecus original Samba 3.5.16, SMB1 dialect 1.5
    * Exact same hardware, network, complete software stack for all
      cases (except varying Samba version on Thecus NAS)
    * Exact same smb.conf for both versions (see attached)
    * Definitely no other load on/access to the NAS during my testing
    * Recorded Wireshark captures in pcapng format for both Write/Read
      scenarios in all above Samba versions
    * Looking at Grand Total Sum of Wireshark "Service Response Time
      Statistics" (SRT) in seconds for all captures to compare
      performance below

A) "Write" Scenario:
Write ~ 1000 Small Files (between <1kB and ~ 20kB) to Samba share on 
Thecus NAS, copying from a directory of ~ 5000 files stored on Win10 
local NTFS

      Samba version   SMB/SMB2 dialect   Total SRT (sec)
      3.5.16          1.5                25
      3.6.25          1.5                21
      3.6.25          2.0.2              341 (!!!)
      4.0.26          3.0                387 (!!!)
      4.2.14          3.0                355 (!!!)
      4.6.5           3.1.1              346 (!!!)

B) "Read" Scenario:
Read ~ 2000 Small Files (between <1kB and ~ 20kB) from a directory of ~ 
5000 files from Samba share on Thecus NAS, copy to local NTFS on Win10

      Samba version   SMB/SMB2 dialect   Total SRT (sec)
      3.5.16          1.5                101
      3.6.25          1.5                100
      3.6.25          2.0.2              139 (!)
      4.0.26          3.0                152 (!)
      4.2.14          3.0                140 (!)
      4.6.5           3.1.1              144 (!)

(Note that the read scenario spends most of the time - even in 3.x/SMB 
1.5 - determining the whole number of ~ 5000 files in this directory, 
before Total Commander even starts copying the ~ 2000 files.)

Summary of findings:

    * For both Write and Read scenario and a huge number of small files,
      performance with SMB2/dialect 2.0/3.0/3.1.1 in all Samba versions
      >= 3.6 up to most recent 4.6 is (much) worse than SMB performance
      with SMB/dialect 1.5 in Samba 3.6 and before.

    * While in the Read scenario, performance is "only" worse by a factor
      of 40% (which might possibly at least partly be explained by
      additional complexity in SMB2), for the Write scenario, performance
      is about *fourteen times* (1400%) worse, a finding which definitely
      cannot be explained to be "working as designed".

    * While SMB/1.5 performance is still fine in the latest 3.6.25, *all
      SMB2-capable releases of Samba from the very first SMB2/2.x
      implementation in Samba 3.6 onwards* seem to be affected by the
      performance regression.

As it seems prohibited to attach Excel or PDF documents when posting to 
this list, I am providing my (anonymized) smb.conf (global section and 
particular share definition) as well as an Excel sheet and a PDF with 
the detailed Wireshark Service Response Time Statistics for Write and 
Read scenario over here:


Am 13.06.2017 um 18:36 schrieb Jeremy Allison:
> Can you get comparitive wireshark traces for the two cases ?
> That would help discover what the bottleneck is.

As requested by Jeremy, the Wireshark "pcapng" packet traces/recordings 
are available for all Samba versions mentioned above in both Read and 
Write scenario. Unfortunately, these recordings do indeed contain 
confidential data both from my machine and the share, so please get back 
to me directly and request access: I will then send you a download link 
and password to the capture files ZIP via private mail.

I also hereby promise that I will do everything I can in order to 
support your analysis, including running follow-up tests on my 
platform/scenario, digging deeper into packet traces or even do source 
code investigations based on your instructions.

I truly hope we will be able to improve general Samba 4.x / SMB2 
performance for the "huge number of small files" scenario as a result of 
this exercise...

Many thanks one more time for your kind help with this!

Best regards,

More information about the samba mailing list