SMB2 performance is worse than SMB1 while iometer 512byte transfer

Jones jones.kstw at gmail.com
Fri Sep 13 16:57:22 CEST 2013


Hi Volker,

With Iometer 512byte sequential read and write to 8GB file,
and test environment is the same as my test with samba-3.6.18,
this time I work with samba4 series,
here is detailed steps:

A. I/O per second and CPU usage
========================
Althouhg both SMB1 and SMB2 got HIGH CPU load,
SMB2 spent more computing cycles on user-space than kernel-space.

samba-4.0.9
Protocol   R/W   IOps   %CPU    %us    %sy   %id   %si
-----------------------------------------------------------
SMB1       Read  57253  100%  56.0%  38.0%  0.0%  6.0%
SMB1       Write 52241 100%  51.0%  42.0%  0.0%  7.0%
SMB2       Read  23174 100%  72.0%  23.0%  0.0%  5.0%
SMB2       Write 18772 100%  68.0$  27.0%  0.0%  5.0%

samba-4.1.0rc3
Protocol   R/W   IOps   %CPU    %us    %sy   %id   %si
-----------------------------------------------------------
SMB1      Read   55288  100%  54.0%  40.0%  0.0%  6.0%
SMB1      Write  52529  100%  52.0%  42.0%  0.0%  6.0%
SMB2      Read   22457  100%  70.3%  24.8%  0.0%  5.0%
SMB2      Write  18860  100%  72.3%  21.8%  0.0%  5.9%

samba-4.2.0pre1-GIT-UNKNOWUN (samba-master on 2013/Sep/12)
Protocol   R/W   IOps   %CPU    %us    %sy   %id   %si
-----------------------------------------------------------
SMB1     Read    54582  100%  53.5%  40.4%  0.0%  6.1%
SMB1     Write   52151  100%  52.5%  41.6%  0.0%  5.0%
SMB2     Read    21410  100%  73.3%  20.8%  0.0%  5.9%
SMB2     Write   20628  100%  73.0%  20.0%  0.0%  7.0%

B. perf top to samba-4.0.9
========================
With SMB2, there are 3 user-space items on the top of of kernel-space
ia32_syscall.
With SMB1, kernel-space ia32_syscall is on the top.

"perf top -p <smbd_pid>" SMB2 enabled shows: (samba-4.0.9)
     7.66%  libtalloc.so.2.0.7   [.] talloc_chunk_from_ptr
     4.02%  libtalloc.so.2.0.7   [.] __talloc
     3.87%  libtalloc.so.2.0.7   [.] _talloc_free_internal
     2.43%  [kernel]             [k] ia32_syscall
     1.84%  libc-2.6.1.so        [.] strcmp
     1.62%  libc-2.6.1.so        [.] memset
     1.48%  libtalloc.so.2.0.7   [.] __i686.get_pc_thunk.bx
     1.47%  libtalloc.so.2.0.7   [.] talloc_alloc_pool
     1.44%  libtalloc.so.2.0.7   [.] _talloc_get_type_abort
     1.38%  libtalloc.so.2.0.7   [.] _talloc_free_poolmem
     1.29%  libtalloc.so.2.0.7   [.] talloc_get_name
     1.26%  libtalloc.so.2.0.7   [.] _talloc_free
     1.23%  libc-2.6.1.so        [.] _int_malloc
     1.10%  [e1000e]             [k] e1000_xmit_frame
     0.98%  libc-2.6.1.so        [.] __gettimeofday
     0.85%  libtevent.so.0.9.18  [.] _tevent_req_create
     0.79%  [kernel]             [k] __ticket_spin_lock
     0.77%  libtalloc.so.2.0.7   [.] _talloc_named_const
     0.74%  libc-2.6.1.so        [.] _int_free
     0.74%  libtalloc.so.2.0.7   [.] _talloc_zero
     0.64%  [kernel]             [k] tcp_sendmsg
 ... ...

"perf top -p <smbd_pid>" SMB1 enabled shows: (samba-4.0.9)
     7.32%  [kernel]             [k] ia32_syscall
     2.44%  libc-2.6.1.so        [.] __gettimeofday
     2.23%  libtalloc.so.2.0.7   [.] talloc_chunk_from_ptr
     1.67%  libc-2.6.1.so        [.] __poll
     1.62%  libpthread-2.6.1.so  [.] __read_nocancel
     1.46%  [kernel]             [k] __ticket_spin_lock
     1.33%  [e1000e]             [k] e1000_xmit_frame
     1.12%  libsmbconf.so.0      [.] event_add_to_poll_args
     1.11%  [kernel]             [k] copy_user_generic_string
     1.07%  libtalloc.so.2.0.7   [.] _talloc_free_internal
     0.93%  libc-2.6.1.so        [.] sendfile64
     0.91%  libpthread-2.6.1.so  [.] __libc_send
     0.89%  libc-2.6.1.so        [.] __fxstat64@@GLIBC_2.2
     0.84%  libsmbconf.so.0      [.] run_events_poll
     0.79%  [kernel]             [k] tcp_recvmsg
     0.79%  libsmbd_base.so      [.] send_file_readX
     0.78%  [kernel]             [k] fget_light
     0.76%  libc-2.6.1.so        [.] _int_malloc
     0.73%  libtalloc.so.2.0.7   [.] __talloc
     0.73%  [kernel]             [k] tcp_sendmsg


C. perf top to samba-4.1.0rc3
========================
With SMB2, there are 3 user-space items on the top of of kernel-space
ia32_syscall.
With SMB1, kernel-space ia32_syscall is on the top.

"perf top -p <smbd_pid>" SMB2 enabled shows: (samba-4.1.0rc3)
     7.86%  libtalloc.so.2.0.8   [.] talloc_chunk_from_ptr
     3.98%  libtalloc.so.2.0.8   [.] _talloc_free_internal
     3.89%  libtalloc.so.2.0.8   [.] __talloc
     2.36%  [kernel]             [k] ia32_syscall
     1.79%  libc-2.6.1.so        [.] strcmp
     1.61%  libtalloc.so.2.0.8   [.] talloc_alloc_pool
     1.49%  libc-2.6.1.so        [.] memset
     1.46%  libtalloc.so.2.0.8   [.] talloc_get_name
     1.38%  libc-2.6.1.so        [.] _int_malloc
     1.35%  libtalloc.so.2.0.8   [.] _talloc_get_type_abort
     1.32%  libtalloc.so.2.0.8   [.] __i686.get_pc_thunk.bx
     1.24%  [e1000e]             [k] e1000_xmit_frame
     1.17%  libtalloc.so.2.0.8   [.] _talloc_free
     1.16%  libtalloc.so.2.0.8   [.] _talloc_free_poolmem
     1.02%  [kernel]             [k] __ticket_spin_lock
     0.97%  libc-2.6.1.so        [.] __gettimeofday
     0.95%  libtevent.so.0.9.18  [.] _tevent_req_create
     0.76%  libc-2.6.1.so        [.] _int_free
     0.74%  [kernel]             [k] tcp_sendmsg

"perf top -p <smbd_pid>" SMB1 enabled shows: (samba-4.1.0rc3)
     7.38%  [kernel]                [k] ia32_syscall
     2.36%  libc-2.6.1.so           [.] __gettimeofday
     2.13%  libtalloc.so.2.0.8      [.] talloc_chunk_from_ptr
     1.64%  libpthread-2.6.1.so     [.] __read_nocancel
     1.63%  [kernel]                [k] __ticket_spin_lock
     1.63%  libc-2.6.1.so           [.] __poll
     1.45%  [e1000e]                [k] e1000_xmit_frame
     1.13%  libsmbconf.so.0         [.] event_add_to_poll_args
     1.06%  [kernel]                [k] copy_user_generic_string
     0.99%  libc-2.6.1.so           [.] sendfile64
     0.97%  libsmbd_base.so         [.] send_file_readX
     0.90%  libtalloc.so.2.0.8      [.] _talloc_free_internal
     0.89%  libc-2.6.1.so           [.] __fxstat64@@GLIBC_2.2
     0.85%  [kernel]                [k] tcp_sendmsg
     0.85%  libpthread-2.6.1.so     [.] __libc_send
     0.85%  libtalloc.so.2.0.8      [.] __talloc
     0.80%  [kernel]                [k] do_sys_poll
     0.78%  [kernel]                [k] tcp_recvmsg


D. perf top to samba-master
========================
With SMB2, there are 3 user-space items on the top of of kernel-space
ia32_syscall.
With SMB1, kernel-space ia32_syscall is on the top.

"perf top -p <smbd_pid>" SMB2 enabled shows: (samba-master)
     7.15%  libtalloc.so.2.1.0   [.] talloc_chunk_from_ptr
     4.25%  libtalloc.so.2.1.0   [.] __talloc_with_prefix
     3.73%  libtalloc.so.2.1.0   [.] _talloc_free_internal
     2.07%  [kernel]             [k] ia32_syscall
     1.94%  libc-2.6.1.so        [.] _int_malloc
     1.75%  libc-2.6.1.so        [.] strcmp
     1.42%  libtalloc.so.2.1.0   [.] __i686.get_pc_thunk.bx
     1.41%  libtalloc.so.2.1.0   [.] talloc_alloc_pool
     1.23%  libtalloc.so.2.1.0   [.] _talloc_free_poolmem
     1.22%  libtalloc.so.2.1.0   [.] _talloc_free
     1.16%  libtalloc.so.2.1.0   [.] talloc_get_name
     1.14%  libtalloc.so.2.1.0   [.] _talloc_get_type_abort
     1.13%  [e1000e]             [k] e1000_xmit_frame
     1.04%  libtevent.so.0.9.19  [.] _tevent_req_create
     1.00%  libc-2.6.1.so        [.] memset
     0.88%  [kernel]             [k] __ticket_spin_lock
     0.83%  libc-2.6.1.so        [.] __gettimeofday
     0.80%  libc-2.6.1.so        [.] _int_free
     0.78%  libtalloc.so.2.1.0   [.] _talloc_pooled_object
     0.77%  libsmbconf.so.0      [.] run_events_poll
 ... ...

"perf top -p <smbd_pid>" SMB1 enabled shows: (samba-master)
     7.44%  [kernel]             [k] ia32_syscall
     2.56%  libc-2.6.1.so        [.] __gettimeofday
     2.01%  libtalloc.so.2.1.0   [.] talloc_chunk_from_ptr
     1.71%  [kernel]             [k] __ticket_spin_lock
     1.68%  libpthread-2.6.1.so  [.] __read_nocancel
     1.61%  libc-2.6.1.so        [.] __poll
     1.55%  [e1000e]             [k] e1000_xmit_frame
     1.09%  libsmbconf.so.0      [.] event_add_to_poll_args
     1.00%  libsmbconf.so.0      [.] run_events_poll
     0.99%  libc-2.6.1.so        [.] sendfile64
     0.97%  [kernel]             [k] copy_user_generic_string
     0.96%  libtalloc.so.2.1.0   [.] _talloc_free_internal
     0.90%  libc-2.6.1.so        [.] __fxstat64@@GLIBC_2.2
     0.90%  libtalloc.so.2.1.0   [.] __talloc_with_prefix
     0.82%  [kernel]             [k] tcp_recvmsg
     0.82%  libsmbd_base.so      [.] send_file_readX
     0.81%  libc-2.6.1.so        [.] _int_malloc
     0.81%  libpthread-2.6.1.so  [.] __libc_send
     0.81%  [kernel]             [k] tcp_sendmsg
 ... ...

E. In short summary for samba4 series
========================
Compare samba4 series,
althouth the talloc and tevent are different version,
the test result are much the same with each other.

F. Questions
========================
Compare "perf top" in 3.6 and 4.0,
the top three items on the top of ia32_syscall are different.
Does this show that differ in SMB2 packet handling in 3.6 and 4.0?

Next I would like to test Linux Distribution, like Ubuntu and etc,
any suggestion is appreciated,
thanks.

Regards,
Jones



2013/9/12 Jones <jones.kstw at gmail.com>

> Hi Volker,
>
> Thanks for kindly feedback soon!
>
>
>>  Yep, that's good.
>
>
> Thanks for the review. I would like to keep it.
>
>
>> That's not necessarily a good idea. We pretty much depend on
>> this to work.
>
>
> Thanks for the review.
> But I still do not get it why the allocate/free APIs would be put inside
> the loop,
> it seems that every loop is involved memory allocation/reclaim,
> should this behavior be considered as overhead? Or this is just my
> paranoid.
>
>
>> You got it pretty much right. SMB2 is much more
>> asynchronous. I would like to ask you to work with 4.0 or
>> even better with Samba master. 3.6 is pretty much dead with
>> regards to performance improvements. I have a Raspberry Pi
>> on my desktop now, I just need to take the time to tune SMB2
>> read/write for this platform. I think that will show the
>> bottlenecks in user space pretty spectacularly.
>
>
> I would like to test with latest samba version,
> for example samba-4.0.9, samba-4.1.0rc3, and git from samba-master.
> Raspberry Pi sounds COOL!
> I've found it on internet shop and plan to order one.
>
>
>>
>> 3.6 and 4.0 differ in SMB2 packet handling, so this might
>> have some influence.
>>
>
> Nice to know that, I would check again.
>
>
>> Are you in the position to consider a move to Samba 4.1
>> (about to be released soon), or do you have to tune 3.6?
>
>
> Im not sure I could wait for Samba 4.1 (plan to be released on Sep/27),
> but I would like to test Samba-4.1.0rc3 to see if anything improved.
> Hence, I have to tune 3.6 at this moment.
>
>
>> BTW, I am actively working on SMB2 leases at this very
>> moment. This will lift the need for small-block transfers
>> significantly. Unfortunately, in the initial run this will
>> be a 4.2 only thing, 4.1 is already closed for that kind of
>> significant architecture change. But it will be interesting
>> for you.
>>
>
> Great work! Very interesting!
>
> Per my understanding,
> microsoft claim that lease is supported since SMB 2.1 or later,
> and samba-3.6 max protocol is SMB2.0.
> If I really want to tune the performance for small-block transfers,
> work with samba-3.6 is a good choice?
>
> And another thought grows up in my mind these days,
> is SMB2 in samba not  designed for or suitable for small-block transfers,
> so Im too hair-splitting to tune the SMB2 for small-block transfers?
>
> Anyway, I would like to feedback the test result ASAP,
> and found this document I would like to read
> http://ftp.samba.org/pub/samba/slides/samba-smb2.pdf
>
> any suggestion is appreciated,
> thanks.
>
> My test environment:
> CPU: Intel(R) Celeron(R) CPU G540 @ 2.50GHz
> RAM: 4GB
>
> Regards,
> Jones
>
>


More information about the samba-technical mailing list