[Samba] Benchmarking linux 5.10 smb3 client namespace performance

Case van Rij case.vanrij at gmail.com
Wed Feb 10 18:40:25 UTC 2021


I've recently started looking at using linux clients as smb3 workload
generators using the spec.org SpecSFS 2014 benchmark.
For the initial performance comparison I'm using 4 windows 2012R2
clients and 4 linux 5.10.13-1.el7.elrepo.x86_64 clients.
Both sets of clients use E5-2637 v4 @ 3.50GHz CPUs with 40GbE to 8
40GbE nics on a high performance NAS array.

The first workload I looked at is the SWBUILD workload, where each
client runs a netmist userspace process with a mostly namespace
workload.
Using 4 windows clients running 80 business metrics this means 400
threads, each attempting to perform 100 operations per second,
This workload using SMB3 targets 40,000 operations per second, and
achieves this with 1.368 ms/sec average latency as measured by
userspace.

 Business    Requested     Achieved     Avg Lat
   Metric      Op Rate      Op Rate        (ms)
       80     40000.00    40000.420       1.368

for each thread, the workload looks like:
        Write_file            ops =       1605  Avg Latency:   0.001544
        Read_file             ops =       1447  Avg Latency:   0.001521
        Mkdir                 ops =        246  Avg Latency:   0.004653
        Unlink                ops =        717  Avg Latency:   0.001635
        Create                ops =        256  Avg Latency:   0.003522
        Stat                  ops =      16556  Avg Latency:   0.001536
        Access                ops =       1401  Avg Latency:   0.001534
        Chmod                 ops =       1226  Avg Latency:   0.003059
        Readdir               ops =        481  Avg Latency:   0.002131

For the initial linux run I scaled it way down to 4 Business metric,
eg. 20 threads each running 100 operations per second.
the first linux client is running 20 threads, mounting with vers=3.02
actimeo=120, across 4 smb3 mounts (4 target ips):

  Business    Requested     Achieved     Avg Lat
    Metric      Op Rate      Op Rate        (ms)
         4      2000.00      427.417      46.612

        Write_file            ops =        442  Avg Latency:   0.057040
        Read_file             ops =        348  Avg Latency:   0.053756
        Mkdir                 ops =         58  Avg Latency:   0.125199
        Unlink                ops =        178  Avg Latency:   0.045923
        Create                ops =         55  Avg Latency:   0.107244
        Stat                  ops =       4069  Avg Latency:   0.047403
        Access                ops =        308  Avg Latency:   0.048040
        Chmod                 ops =        294  Avg Latency:   0.046134
        Readdir               ops =        133  Avg Latency:   0.038968

So that seems pretty surprising .. especially since server side
latency averages sub 1ms!
a PCAP analysis on smb rtt confirms the same on the wire,

SMB2 SRT Statistics:
Filter: smb2.cmd
Index  Commands               Calls    Min SRT    Max SRT    Avg SRT    Sum SRT
    5  Create                  17763   0.000193   0.034437   0.001351  23.998490
    6  Close                   16477   0.000090   0.034437   0.001301  21.433632
    7  Flush                       9   0.000616   0.004260   0.001175   0.010574
    8  Read                      648   0.000111   0.007905   0.000855   0.554251
    9  Write                    1077   0.000121   0.006913   0.001817   1.956459
   14  Find                      209   0.000625   0.012334   0.001705   0.356392
   16  GetInfo                 15180   0.000193   0.034437   0.001291  19.601276
   17  SetInfo                   148   0.000843   0.010075   0.002392   0.354038
==================================================================

I scaled down the per-thread operation rate to 10 operations per second:

  Business    Requested     Achieved     Avg Lat
    Metric      Op Rate      Op Rate        (ms)
        4       200.00      200.089       2.144
       32      1600.00     1600.605       2.035
       40      2000.00     2000.748       4.705
       48      2400.00     2400.926       5.323
       56      2800.00     2801.046       6.428
       64      3200.00     3199.791      10.469
       72      3600.00     3222.795      52.887

on the linux side, each netmist load generating thread averages 1%
cpu, and cifsd threads are under 1% cpu

Long story short .. before I start profiling the linux kernel side, I
would be curious if anyone has performed similar tests and perhaps has
solutions or known issues.

Thanks,
Case



More information about the samba mailing list