[Samba] Samba4 consumes more CPU
Thiago Fernandes Crepaldi
tognado at gmail.com
Tue Oct 1 15:04:57 MDT 2013
That is funny. Now that I replaced samba 4 and libc-2.13.so with debug
symbols, the perf profile seems to be have changed a bit after the same
tests !
Events: 54K cycles
- 3.06% smbd [kernel.kallsyms] [k] copy_user_generic_unrolled
- copy_user_generic_unrolled
52.63% __read_nocancel
36.20% __write_nocancel
2.70% __getdents64
2.44% __libc_readv
+ 2.00% do_fcntl
0.87% __GI___libc_read
+ 0.77% __fxstat64
- 2.02% smbd libc-2.13.so [.] _int_malloc
+ _int_malloc
- 1.62% smbd [kernel.kallsyms] [k] kmem_cache_alloc
+ kmem_cache_alloc
- 1.22% smbd libtalloc.so.2.0.7 [.] _talloc_free
+ _talloc_free
- 0.99% smbd libtalloc.so.2.0.7 [.]
_talloc_free_children_internal.isra.4
+ _talloc_free_children_internal.isra.4
- 0.86% smbd libc-2.13.so [.] __memcpy_ssse3
+ __memcpy_ssse3
+ 0.81% smbd [kernel.kallsyms] [k] kmem_cache_free
+ 0.81% smbd libc-2.13.so [.] _int_free
+ 0.79% smbd [kernel.kallsyms] [k] __kmalloc
+ 0.66% smbd libtalloc.so.2.0.7 [.] _talloc_zero
+ 0.63% smbd [kernel.kallsyms] [k] link_path_walk
+ 0.63% smbd [kernel.kallsyms] [k] ext4_htree_store_dirent
+ 0.55% smbd libtalloc.so.2.0.7 [.] talloc_alloc_pool
+ 0.55% smbd libc-2.13.so [.] __memset_sse2
+ 0.53% smbd libc-2.13.so [.] malloc
+ 0.53% smbd [kernel.kallsyms] [k] fcntl_setlk
+ 0.52% smbd [kernel.kallsyms] [k] get_page_from_freelist
+ 0.50% smbd libtalloc.so.2.0.7 [.] talloc_get_name
+ 0.50% smbd [kernel.kallsyms] [k] tg3_start_xmit
+ 0.48% smbd [kernel.kallsyms] [k] memset
+ 0.47% smbd libc-2.13.so [.] free
+ 0.47% smbd [kernel.kallsyms] [k] _raw_spin_lock
+ 0.45% smbd [kernel.kallsyms] [k] __d_lookup_rcu
+ 0.45% smbd libc-2.13.so [.] __GI___strcmp_ssse3
+ 0.44% smbd libtalloc.so.2.0.7 [.] _talloc_get_type_abort
+ 0.43% smbd [kernel.kallsyms] [k] system_call_after_swapgs
+ 0.43% smbd [kernel.kallsyms] [k] ext4_mark_iloc_dirty
+ 0.42% smbd libtalloc.so.2.0.7 [.] talloc_is_parent
+ 0.41% smbd [kernel.kallsyms] [k] __alloc_skb
+ 0.41% smbd [kernel.kallsyms] [k] __posix_lock_file
+ 0.40% smbd [kernel.kallsyms] [k] __ext4_get_inode_loc
+ 0.39% smbd libc-2.13.so [.] __strlen_sse2
+ 0.39% smbd [kernel.kallsyms] [k] kfree
+ 0.39% smbd [kernel.kallsyms] [k] tcp_recvmsg
+ 0.38% smbd libtalloc.so.2.0.7 [.] talloc_named_const
+ 0.37% smbd libtalloc.so.2.0.7 [.] _talloc_array
On Mon, Sep 30, 2013 at 6:19 PM, Thiago Fernandes Crepaldi <
tognado at gmail.com> wrote:
> Agreed. For some strange reason I though perf would "follow" the new smbd
> forked and account their data too =)
>
> Unfortunately, I don't have the libc symbols (at least for today) to see
> what is going on there, but here is what I got in the child smbd process on
> the server side. The client side is a Windows 7 Virtual machine running
> NASPT
>
> Could this result mean that most of the time the performance drop I am
> experiencing is due to libc ?
> I've never worked with perf before, but I will still try to resolve those
> crazy addresses
>
> Events: 45K cycles
> - 7.37% smbd libc-2.13.so [.] 0x11e465
> - 0x7ffab9f2043c
> 41.73% 0
> 5.32% 0x1b3fbe0
> 5.29% 0x2c4dab0
> 3.60% 0x1b0b130
> 3.37% 0x1b0b2a0
> 2.94% 0x1b5af80
> 2.70% 0x1b0d850
> 2.64% 0x2825fb0
> 1.86% 0x28e06d0
> 1.83% 0x2afcc80
> 1.71% 0x1b2ccb0
> 1.64% 0x2a4deb0
> 1.63% 0x1b56e00
> 1.51% 0x1b6bd00
> 1.16% 0x1b49eb0
> 1.15% 0x1b506e0
> 1.13% 0x1b4da00
> 1.07% 0x1b35100
> 0.93% 0x1af9050
> 0.92% 0x2b03680
> 0.91% 0x2ae21f0
> 0.90% 0x1b21210
> 0.89% 0x1b5de80
> 0.89% 0x1b5aa80
> 0.89% 0x1b2e0e0
> 0.88% 0x1b59be0
> 0.87% 0x1b4c600
> 0.86% 0x1b2aa20
> 0.85% 0x1b4a940
> 0.85% 0x1b45f50
> 0.84% 0x1b4a6d0
> 0.84% 0x1b23940
> 0.82% 0x1b37210
> 0.82% 0x1b2cf30
> 0.82% 0x1b33320
> 0.77% 0x2c96d50
> 0.76% 0x202f380
> 0.75% 0x2bd0bd0
> 0.66% 0x1b5e1d0
> - 0x7ffab9f27e10
> 37.72% 0x2f62696c2f3365
> + 23.78% 0
> + 11.24% 0x7fffc9f76d40
> + 6.25% set_unix_security_ctx
> 3.13% 0x645f6e656b6f74
> 2.46% 0x1000900000000
> + 2.17% 0x11b9f22aac
> 2.16% 0x1b53000
> + 2.12% 0x2a29850
> 2.08% 0xbe70f000004c4c
> 2.01% 0x1b0af00
> 1.94% 0x1b07390
> 1.51% 0x1b49b00
> 1.41% 0x2010
> - 0x7ffab9fc6c10
> + 18.08% 0
> + 13.63% 0x2c5fc20
> + 11.62% 0x2be7b10
> + 7.90% 0x2be8560
> + 6.61% 0x2a29850
> + 6.30% 0x2b3d6c0
> 5.67% 0x4e6f5479706f43
> + 5.64% 0x29d7110
> + 5.54% 0x2467130
> + 5.53% 0x2b3d5e0
> + 5.31% 0x28c81a0
> + 4.20% 0x2c5fa30
> + 3.98% 0x2a98990
> + 0x7ffab9f20438
> + 0x7ffab9f2045c
> 0x7ffab9fc8e03
> + 0x7ffab9fc425e
> + 0x7ffab9f2a715
> + 0x7ffab9f2a6d0
> 0x7ffab9f1f851
> 0x7ffab9f1f2ac
> + 0x7ffab9f27e25
> + 0x7ffab9f2a648
> + 0x7ffab9fc4240
> 0x7ffab9fc8654
> 0x7ffab9f206bf
> + 0x7ffab9f20548
> + 0x7ffab9f20bc2
> + 0x7ffab9f1f130
> + 0x7ffab9f26310
> + 0x7ffab9f20422
> 0x7ffab9f1e0db
> 0x7ffab9f1f179
> + 0x7ffab9f2a6f2
> + 0x7ffab9f20572
> + 0x7ffab9f2054c
> + 0x7ffab9fc42c5
> - 1.72% smbd [kernel.kallsyms] [k] kmem_cache_alloc
> + kmem_cache_alloc
> - 1.30% smbd libtalloc.so.2.0.7 [.] _talloc_free
> + _talloc_free
> - 1.10% smbd libtalloc.so.2.0.7 [.]
> _talloc_free_children_internal.i
> + _talloc_free_children_internal.isra.4
> - 1.07% smbd [kernel.kallsyms] [k] copy_user_generic_unrolled
> + copy_user_generic_unrolled
> - 0.95% smbd [kernel.kallsyms] [k] __kmalloc
> + __kmalloc
> - 0.78% smbd [kernel.kallsyms] [k] ext4_htree_store_dirent
> + ext4_htree_store_dirent
> + 0x7ffab9f4f2f5
> - 0.73% smbd [kernel.kallsyms] [k] kmem_cache_free
> + kmem_cache_free
> - 0.73% smbd [kernel.kallsyms] [k] link_path_walk
> + link_path_walk
> - 0.69% smbd libc-2.13.so [.] malloc
> + malloc
> - 0.69% smbd libtalloc.so.2.0.7 [.] _talloc_zero
> + _talloc_zero
> - 0.62% smbd [kernel.kallsyms] [k] fcntl_setlk
> + fcntl_setlk
> + 0x7ffabcf93238
> - 0.59% smbd [kernel.kallsyms] [k] __d_lookup_rcu
> + __d_lookup_rcu
> - 0.57% smbd libtalloc.so.2.0.7 [.] talloc_alloc_pool
> + talloc_alloc_pool
> - 0.55% smbd libtalloc.so.2.0.7 [.] talloc_get_name
> + talloc_get_name
> - 0.55% smbd [kernel.kallsyms] [k] __posix_lock_file
> + __posix_lock_file
> + 0x7ffabcf93238
> - 0.50% smbd [kernel.kallsyms] [k] _raw_spin_lock
> + _raw_spin_lock
> + 0.49% smbd [kernel.kallsyms] [k] tg3_start_xmit
> + 0.48% smbd [kernel.kallsyms] [k] system_call_after_swapgs
> + 0.46% smbd libtalloc.so.2.0.7 [.] talloc_named_const
> + 0.46% smbd [kernel.kallsyms] [k] memset
> + 0.46% smbd libtalloc.so.2.0.7 [.] _talloc_get_type_abort
> + 0.45% smbd [kernel.kallsyms] [k] str2hashbuf_signed
> + 0.45% smbd [kernel.kallsyms] [k] kfree
> + 0.45% smbd libc-2.13.so [.] free
> + 0.44% smbd [kernel.kallsyms] [k] __alloc_skb
> + 0.42% smbd libtalloc.so.2.0.7 [.] talloc_is_parent
> + 0.41% smbd libtalloc.so.2.0.7 [.] _talloc_array
>
>
>
>
>
> On Mon, Sep 30, 2013 at 5:39 PM, Jeremy Allison <jra at samba.org> wrote:
>
>> On Mon, Sep 30, 2013 at 05:21:44PM -0300, Thiago Fernandes Crepaldi wrote:
>> > Andrew, in my company we are also experiencing a higher CPU usage of
>> Samba
>> > 4 (smbd) if compared to Samba 3.
>> >
>> > In fact, it almost reaches 100% of CPU and uses all the memory during
>> *dir
>> > copies* (individual file copy is as good as samba 3's). I strongly
>> believe
>> > that this CPU usage is the responsible for a worse samba 4's throughput
>> if
>> > compared to Samba 3 tests.
>> >
>> > Giving that, I would like to contribute with this investigation and
>> share
>> > my data regarding perf profiling on smbd (parent process)
>> >
>> > Events: 7 cycles
>> > - 90.01% smbd [kernel.kallsyms] [k] copy_pte_range
>> > copy_pte_range
>> > __libc_fork
>> > smbd_accept_connection
>> > - 9.77% smbd [kernel.kallsyms] [k] handle_edge_irq
>> > handle_edge_irq
>> > smbd_accept_connection
>> > - 0.22% smbd [kernel.kallsyms] [k] perf_pmu_rotate_start.isra.57
>> > perf_pmu_rotate_start.isra.57
>> > __poll
>> > - 0.00% smbd [kernel.kallsyms] [k] native_write_msr_safe
>> > native_write_msr_safe
>> > __poll
>>
>> It's the client process that should have the interesting
>> profile data, the parent is just going to sit there doing
>> accept().
>>
>> Jeremy.
>>
>
>
>
> --
> Thiago Crepaldi
>
--
Thiago Crepaldi
More information about the samba
mailing list