SMB2 performance is worse than SMB1 while iometer 512byte transfer
Jones
jones.kstw at gmail.com
Fri Sep 13 16:57:22 CEST 2013
Hi Volker,
With Iometer 512byte sequential read and write to 8GB file,
and test environment is the same as my test with samba-3.6.18,
this time I work with samba4 series,
here is detailed steps:
A. I/O per second and CPU usage
========================
Althouhg both SMB1 and SMB2 got HIGH CPU load,
SMB2 spent more computing cycles on user-space than kernel-space.
samba-4.0.9
Protocol R/W IOps %CPU %us %sy %id %si
-----------------------------------------------------------
SMB1 Read 57253 100% 56.0% 38.0% 0.0% 6.0%
SMB1 Write 52241 100% 51.0% 42.0% 0.0% 7.0%
SMB2 Read 23174 100% 72.0% 23.0% 0.0% 5.0%
SMB2 Write 18772 100% 68.0$ 27.0% 0.0% 5.0%
samba-4.1.0rc3
Protocol R/W IOps %CPU %us %sy %id %si
-----------------------------------------------------------
SMB1 Read 55288 100% 54.0% 40.0% 0.0% 6.0%
SMB1 Write 52529 100% 52.0% 42.0% 0.0% 6.0%
SMB2 Read 22457 100% 70.3% 24.8% 0.0% 5.0%
SMB2 Write 18860 100% 72.3% 21.8% 0.0% 5.9%
samba-4.2.0pre1-GIT-UNKNOWUN (samba-master on 2013/Sep/12)
Protocol R/W IOps %CPU %us %sy %id %si
-----------------------------------------------------------
SMB1 Read 54582 100% 53.5% 40.4% 0.0% 6.1%
SMB1 Write 52151 100% 52.5% 41.6% 0.0% 5.0%
SMB2 Read 21410 100% 73.3% 20.8% 0.0% 5.9%
SMB2 Write 20628 100% 73.0% 20.0% 0.0% 7.0%
B. perf top to samba-4.0.9
========================
With SMB2, there are 3 user-space items on the top of of kernel-space
ia32_syscall.
With SMB1, kernel-space ia32_syscall is on the top.
"perf top -p <smbd_pid>" SMB2 enabled shows: (samba-4.0.9)
7.66% libtalloc.so.2.0.7 [.] talloc_chunk_from_ptr
4.02% libtalloc.so.2.0.7 [.] __talloc
3.87% libtalloc.so.2.0.7 [.] _talloc_free_internal
2.43% [kernel] [k] ia32_syscall
1.84% libc-2.6.1.so [.] strcmp
1.62% libc-2.6.1.so [.] memset
1.48% libtalloc.so.2.0.7 [.] __i686.get_pc_thunk.bx
1.47% libtalloc.so.2.0.7 [.] talloc_alloc_pool
1.44% libtalloc.so.2.0.7 [.] _talloc_get_type_abort
1.38% libtalloc.so.2.0.7 [.] _talloc_free_poolmem
1.29% libtalloc.so.2.0.7 [.] talloc_get_name
1.26% libtalloc.so.2.0.7 [.] _talloc_free
1.23% libc-2.6.1.so [.] _int_malloc
1.10% [e1000e] [k] e1000_xmit_frame
0.98% libc-2.6.1.so [.] __gettimeofday
0.85% libtevent.so.0.9.18 [.] _tevent_req_create
0.79% [kernel] [k] __ticket_spin_lock
0.77% libtalloc.so.2.0.7 [.] _talloc_named_const
0.74% libc-2.6.1.so [.] _int_free
0.74% libtalloc.so.2.0.7 [.] _talloc_zero
0.64% [kernel] [k] tcp_sendmsg
... ...
"perf top -p <smbd_pid>" SMB1 enabled shows: (samba-4.0.9)
7.32% [kernel] [k] ia32_syscall
2.44% libc-2.6.1.so [.] __gettimeofday
2.23% libtalloc.so.2.0.7 [.] talloc_chunk_from_ptr
1.67% libc-2.6.1.so [.] __poll
1.62% libpthread-2.6.1.so [.] __read_nocancel
1.46% [kernel] [k] __ticket_spin_lock
1.33% [e1000e] [k] e1000_xmit_frame
1.12% libsmbconf.so.0 [.] event_add_to_poll_args
1.11% [kernel] [k] copy_user_generic_string
1.07% libtalloc.so.2.0.7 [.] _talloc_free_internal
0.93% libc-2.6.1.so [.] sendfile64
0.91% libpthread-2.6.1.so [.] __libc_send
0.89% libc-2.6.1.so [.] __fxstat64@@GLIBC_2.2
0.84% libsmbconf.so.0 [.] run_events_poll
0.79% [kernel] [k] tcp_recvmsg
0.79% libsmbd_base.so [.] send_file_readX
0.78% [kernel] [k] fget_light
0.76% libc-2.6.1.so [.] _int_malloc
0.73% libtalloc.so.2.0.7 [.] __talloc
0.73% [kernel] [k] tcp_sendmsg
C. perf top to samba-4.1.0rc3
========================
With SMB2, there are 3 user-space items on the top of of kernel-space
ia32_syscall.
With SMB1, kernel-space ia32_syscall is on the top.
"perf top -p <smbd_pid>" SMB2 enabled shows: (samba-4.1.0rc3)
7.86% libtalloc.so.2.0.8 [.] talloc_chunk_from_ptr
3.98% libtalloc.so.2.0.8 [.] _talloc_free_internal
3.89% libtalloc.so.2.0.8 [.] __talloc
2.36% [kernel] [k] ia32_syscall
1.79% libc-2.6.1.so [.] strcmp
1.61% libtalloc.so.2.0.8 [.] talloc_alloc_pool
1.49% libc-2.6.1.so [.] memset
1.46% libtalloc.so.2.0.8 [.] talloc_get_name
1.38% libc-2.6.1.so [.] _int_malloc
1.35% libtalloc.so.2.0.8 [.] _talloc_get_type_abort
1.32% libtalloc.so.2.0.8 [.] __i686.get_pc_thunk.bx
1.24% [e1000e] [k] e1000_xmit_frame
1.17% libtalloc.so.2.0.8 [.] _talloc_free
1.16% libtalloc.so.2.0.8 [.] _talloc_free_poolmem
1.02% [kernel] [k] __ticket_spin_lock
0.97% libc-2.6.1.so [.] __gettimeofday
0.95% libtevent.so.0.9.18 [.] _tevent_req_create
0.76% libc-2.6.1.so [.] _int_free
0.74% [kernel] [k] tcp_sendmsg
"perf top -p <smbd_pid>" SMB1 enabled shows: (samba-4.1.0rc3)
7.38% [kernel] [k] ia32_syscall
2.36% libc-2.6.1.so [.] __gettimeofday
2.13% libtalloc.so.2.0.8 [.] talloc_chunk_from_ptr
1.64% libpthread-2.6.1.so [.] __read_nocancel
1.63% [kernel] [k] __ticket_spin_lock
1.63% libc-2.6.1.so [.] __poll
1.45% [e1000e] [k] e1000_xmit_frame
1.13% libsmbconf.so.0 [.] event_add_to_poll_args
1.06% [kernel] [k] copy_user_generic_string
0.99% libc-2.6.1.so [.] sendfile64
0.97% libsmbd_base.so [.] send_file_readX
0.90% libtalloc.so.2.0.8 [.] _talloc_free_internal
0.89% libc-2.6.1.so [.] __fxstat64@@GLIBC_2.2
0.85% [kernel] [k] tcp_sendmsg
0.85% libpthread-2.6.1.so [.] __libc_send
0.85% libtalloc.so.2.0.8 [.] __talloc
0.80% [kernel] [k] do_sys_poll
0.78% [kernel] [k] tcp_recvmsg
D. perf top to samba-master
========================
With SMB2, there are 3 user-space items on the top of of kernel-space
ia32_syscall.
With SMB1, kernel-space ia32_syscall is on the top.
"perf top -p <smbd_pid>" SMB2 enabled shows: (samba-master)
7.15% libtalloc.so.2.1.0 [.] talloc_chunk_from_ptr
4.25% libtalloc.so.2.1.0 [.] __talloc_with_prefix
3.73% libtalloc.so.2.1.0 [.] _talloc_free_internal
2.07% [kernel] [k] ia32_syscall
1.94% libc-2.6.1.so [.] _int_malloc
1.75% libc-2.6.1.so [.] strcmp
1.42% libtalloc.so.2.1.0 [.] __i686.get_pc_thunk.bx
1.41% libtalloc.so.2.1.0 [.] talloc_alloc_pool
1.23% libtalloc.so.2.1.0 [.] _talloc_free_poolmem
1.22% libtalloc.so.2.1.0 [.] _talloc_free
1.16% libtalloc.so.2.1.0 [.] talloc_get_name
1.14% libtalloc.so.2.1.0 [.] _talloc_get_type_abort
1.13% [e1000e] [k] e1000_xmit_frame
1.04% libtevent.so.0.9.19 [.] _tevent_req_create
1.00% libc-2.6.1.so [.] memset
0.88% [kernel] [k] __ticket_spin_lock
0.83% libc-2.6.1.so [.] __gettimeofday
0.80% libc-2.6.1.so [.] _int_free
0.78% libtalloc.so.2.1.0 [.] _talloc_pooled_object
0.77% libsmbconf.so.0 [.] run_events_poll
... ...
"perf top -p <smbd_pid>" SMB1 enabled shows: (samba-master)
7.44% [kernel] [k] ia32_syscall
2.56% libc-2.6.1.so [.] __gettimeofday
2.01% libtalloc.so.2.1.0 [.] talloc_chunk_from_ptr
1.71% [kernel] [k] __ticket_spin_lock
1.68% libpthread-2.6.1.so [.] __read_nocancel
1.61% libc-2.6.1.so [.] __poll
1.55% [e1000e] [k] e1000_xmit_frame
1.09% libsmbconf.so.0 [.] event_add_to_poll_args
1.00% libsmbconf.so.0 [.] run_events_poll
0.99% libc-2.6.1.so [.] sendfile64
0.97% [kernel] [k] copy_user_generic_string
0.96% libtalloc.so.2.1.0 [.] _talloc_free_internal
0.90% libc-2.6.1.so [.] __fxstat64@@GLIBC_2.2
0.90% libtalloc.so.2.1.0 [.] __talloc_with_prefix
0.82% [kernel] [k] tcp_recvmsg
0.82% libsmbd_base.so [.] send_file_readX
0.81% libc-2.6.1.so [.] _int_malloc
0.81% libpthread-2.6.1.so [.] __libc_send
0.81% [kernel] [k] tcp_sendmsg
... ...
E. In short summary for samba4 series
========================
Compare samba4 series,
althouth the talloc and tevent are different version,
the test result are much the same with each other.
F. Questions
========================
Compare "perf top" in 3.6 and 4.0,
the top three items on the top of ia32_syscall are different.
Does this show that differ in SMB2 packet handling in 3.6 and 4.0?
Next I would like to test Linux Distribution, like Ubuntu and etc,
any suggestion is appreciated,
thanks.
Regards,
Jones
2013/9/12 Jones <jones.kstw at gmail.com>
> Hi Volker,
>
> Thanks for kindly feedback soon!
>
>
>> Yep, that's good.
>
>
> Thanks for the review. I would like to keep it.
>
>
>> That's not necessarily a good idea. We pretty much depend on
>> this to work.
>
>
> Thanks for the review.
> But I still do not get it why the allocate/free APIs would be put inside
> the loop,
> it seems that every loop is involved memory allocation/reclaim,
> should this behavior be considered as overhead? Or this is just my
> paranoid.
>
>
>> You got it pretty much right. SMB2 is much more
>> asynchronous. I would like to ask you to work with 4.0 or
>> even better with Samba master. 3.6 is pretty much dead with
>> regards to performance improvements. I have a Raspberry Pi
>> on my desktop now, I just need to take the time to tune SMB2
>> read/write for this platform. I think that will show the
>> bottlenecks in user space pretty spectacularly.
>
>
> I would like to test with latest samba version,
> for example samba-4.0.9, samba-4.1.0rc3, and git from samba-master.
> Raspberry Pi sounds COOL!
> I've found it on internet shop and plan to order one.
>
>
>>
>> 3.6 and 4.0 differ in SMB2 packet handling, so this might
>> have some influence.
>>
>
> Nice to know that, I would check again.
>
>
>> Are you in the position to consider a move to Samba 4.1
>> (about to be released soon), or do you have to tune 3.6?
>
>
> Im not sure I could wait for Samba 4.1 (plan to be released on Sep/27),
> but I would like to test Samba-4.1.0rc3 to see if anything improved.
> Hence, I have to tune 3.6 at this moment.
>
>
>> BTW, I am actively working on SMB2 leases at this very
>> moment. This will lift the need for small-block transfers
>> significantly. Unfortunately, in the initial run this will
>> be a 4.2 only thing, 4.1 is already closed for that kind of
>> significant architecture change. But it will be interesting
>> for you.
>>
>
> Great work! Very interesting!
>
> Per my understanding,
> microsoft claim that lease is supported since SMB 2.1 or later,
> and samba-3.6 max protocol is SMB2.0.
> If I really want to tune the performance for small-block transfers,
> work with samba-3.6 is a good choice?
>
> And another thought grows up in my mind these days,
> is SMB2 in samba not designed for or suitable for small-block transfers,
> so Im too hair-splitting to tune the SMB2 for small-block transfers?
>
> Anyway, I would like to feedback the test result ASAP,
> and found this document I would like to read
> http://ftp.samba.org/pub/samba/slides/samba-smb2.pdf
>
> any suggestion is appreciated,
> thanks.
>
> My test environment:
> CPU: Intel(R) Celeron(R) CPU G540 @ 2.50GHz
> RAM: 4GB
>
> Regards,
> Jones
>
>
More information about the samba-technical
mailing list