[Samba] Samba4 consumes more CPU

Thiago Fernandes Crepaldi tognado at gmail.com
Tue Oct 1 15:04:57 MDT 2013


That is funny. Now that I replaced samba 4 and libc-2.13.so with debug
symbols, the perf profile seems to be have changed a bit after the same
tests !

Events: 54K cycles
-   3.06%  smbd  [kernel.kallsyms]         [k] copy_user_generic_unrolled
   - copy_user_generic_unrolled
        52.63% __read_nocancel
        36.20% __write_nocancel
        2.70% __getdents64
        2.44% __libc_readv
      + 2.00% do_fcntl
        0.87% __GI___libc_read
      + 0.77% __fxstat64
-   2.02%  smbd  libc-2.13.so              [.] _int_malloc
   + _int_malloc
-   1.62%  smbd  [kernel.kallsyms]         [k] kmem_cache_alloc
   + kmem_cache_alloc
-   1.22%  smbd  libtalloc.so.2.0.7        [.] _talloc_free
   + _talloc_free
-   0.99%  smbd  libtalloc.so.2.0.7        [.]
_talloc_free_children_internal.isra.4
   + _talloc_free_children_internal.isra.4
-   0.86%  smbd  libc-2.13.so              [.] __memcpy_ssse3
   + __memcpy_ssse3
+   0.81%  smbd  [kernel.kallsyms]         [k] kmem_cache_free
+   0.81%  smbd  libc-2.13.so              [.] _int_free
+   0.79%  smbd  [kernel.kallsyms]         [k] __kmalloc
+   0.66%  smbd  libtalloc.so.2.0.7        [.] _talloc_zero
+   0.63%  smbd  [kernel.kallsyms]         [k] link_path_walk
+   0.63%  smbd  [kernel.kallsyms]         [k] ext4_htree_store_dirent
+   0.55%  smbd  libtalloc.so.2.0.7        [.] talloc_alloc_pool
+   0.55%  smbd  libc-2.13.so              [.] __memset_sse2
+   0.53%  smbd  libc-2.13.so              [.] malloc
+   0.53%  smbd  [kernel.kallsyms]         [k] fcntl_setlk
+   0.52%  smbd  [kernel.kallsyms]         [k] get_page_from_freelist
+   0.50%  smbd  libtalloc.so.2.0.7        [.] talloc_get_name
+   0.50%  smbd  [kernel.kallsyms]         [k] tg3_start_xmit
+   0.48%  smbd  [kernel.kallsyms]         [k] memset
+   0.47%  smbd  libc-2.13.so              [.] free
+   0.47%  smbd  [kernel.kallsyms]         [k] _raw_spin_lock
+   0.45%  smbd  [kernel.kallsyms]         [k] __d_lookup_rcu
+   0.45%  smbd  libc-2.13.so              [.] __GI___strcmp_ssse3
+   0.44%  smbd  libtalloc.so.2.0.7        [.] _talloc_get_type_abort
+   0.43%  smbd  [kernel.kallsyms]         [k] system_call_after_swapgs
+   0.43%  smbd  [kernel.kallsyms]         [k] ext4_mark_iloc_dirty
+   0.42%  smbd  libtalloc.so.2.0.7        [.] talloc_is_parent
+   0.41%  smbd  [kernel.kallsyms]         [k] __alloc_skb
+   0.41%  smbd  [kernel.kallsyms]         [k] __posix_lock_file
+   0.40%  smbd  [kernel.kallsyms]         [k] __ext4_get_inode_loc
+   0.39%  smbd  libc-2.13.so              [.] __strlen_sse2
+   0.39%  smbd  [kernel.kallsyms]         [k] kfree
+   0.39%  smbd  [kernel.kallsyms]         [k] tcp_recvmsg
+   0.38%  smbd  libtalloc.so.2.0.7        [.] talloc_named_const
+   0.37%  smbd  libtalloc.so.2.0.7        [.] _talloc_array


On Mon, Sep 30, 2013 at 6:19 PM, Thiago Fernandes Crepaldi <
tognado at gmail.com> wrote:

> Agreed. For some strange reason I though perf would "follow" the new smbd
> forked and account their data too =)
>
> Unfortunately, I don't have the libc symbols (at least for today) to see
> what is going on there, but here is what I got in the child smbd process on
> the server side. The client side is a Windows 7 Virtual machine running
> NASPT
>
> Could this result mean that most of the time the performance drop I am
> experiencing is due to libc ?
> I've never worked with perf before, but I will still try to resolve those
> crazy addresses
>
> Events: 45K cycles
> -   7.37%  smbd  libc-2.13.so              [.] 0x11e465
>    - 0x7ffab9f2043c
>         41.73% 0
>         5.32% 0x1b3fbe0
>         5.29% 0x2c4dab0
>         3.60% 0x1b0b130
>         3.37% 0x1b0b2a0
>         2.94% 0x1b5af80
>         2.70% 0x1b0d850
>         2.64% 0x2825fb0
>         1.86% 0x28e06d0
>         1.83% 0x2afcc80
>         1.71% 0x1b2ccb0
>         1.64% 0x2a4deb0
>         1.63% 0x1b56e00
>         1.51% 0x1b6bd00
>         1.16% 0x1b49eb0
>         1.15% 0x1b506e0
>         1.13% 0x1b4da00
>         1.07% 0x1b35100
>         0.93% 0x1af9050
>         0.92% 0x2b03680
>         0.91% 0x2ae21f0
>         0.90% 0x1b21210
>         0.89% 0x1b5de80
>         0.89% 0x1b5aa80
>         0.89% 0x1b2e0e0
>         0.88% 0x1b59be0
>         0.87% 0x1b4c600
>         0.86% 0x1b2aa20
>         0.85% 0x1b4a940
>         0.85% 0x1b45f50
>         0.84% 0x1b4a6d0
>         0.84% 0x1b23940
>         0.82% 0x1b37210
>         0.82% 0x1b2cf30
>         0.82% 0x1b33320
>         0.77% 0x2c96d50
>         0.76% 0x202f380
>         0.75% 0x2bd0bd0
> 0.66% 0x1b5e1d0
>    - 0x7ffab9f27e10
>         37.72% 0x2f62696c2f3365
>       + 23.78% 0
>       + 11.24% 0x7fffc9f76d40
>       + 6.25% set_unix_security_ctx
>         3.13% 0x645f6e656b6f74
>         2.46% 0x1000900000000
>       + 2.17% 0x11b9f22aac
>         2.16% 0x1b53000
>       + 2.12% 0x2a29850
>         2.08% 0xbe70f000004c4c
>         2.01% 0x1b0af00
>         1.94% 0x1b07390
>         1.51% 0x1b49b00
>         1.41% 0x2010
>    - 0x7ffab9fc6c10
>       + 18.08% 0
>       + 13.63% 0x2c5fc20
>       + 11.62% 0x2be7b10
>       + 7.90% 0x2be8560
>       + 6.61% 0x2a29850
>       + 6.30% 0x2b3d6c0
>         5.67% 0x4e6f5479706f43
>       + 5.64% 0x29d7110
>       + 5.54% 0x2467130
>       + 5.53% 0x2b3d5e0
>       + 5.31% 0x28c81a0
>       + 4.20% 0x2c5fa30
>       + 3.98% 0x2a98990
>    + 0x7ffab9f20438
>    + 0x7ffab9f2045c
>      0x7ffab9fc8e03
>    + 0x7ffab9fc425e
>    + 0x7ffab9f2a715
>    + 0x7ffab9f2a6d0
>      0x7ffab9f1f851
>      0x7ffab9f1f2ac
>    + 0x7ffab9f27e25
>    + 0x7ffab9f2a648
>    + 0x7ffab9fc4240
>      0x7ffab9fc8654
>      0x7ffab9f206bf
>    + 0x7ffab9f20548
>    + 0x7ffab9f20bc2
>    + 0x7ffab9f1f130
>    + 0x7ffab9f26310
>    + 0x7ffab9f20422
>      0x7ffab9f1e0db
>      0x7ffab9f1f179
>    + 0x7ffab9f2a6f2
>    + 0x7ffab9f20572
>    + 0x7ffab9f2054c
>    + 0x7ffab9fc42c5
> -   1.72%  smbd  [kernel.kallsyms]         [k] kmem_cache_alloc
>    + kmem_cache_alloc
> -   1.30%  smbd  libtalloc.so.2.0.7        [.] _talloc_free
>    + _talloc_free
> -   1.10%  smbd  libtalloc.so.2.0.7        [.]
> _talloc_free_children_internal.i
>     + _talloc_free_children_internal.isra.4
> -   1.07%  smbd  [kernel.kallsyms]         [k] copy_user_generic_unrolled
>    + copy_user_generic_unrolled
> -   0.95%  smbd  [kernel.kallsyms]         [k] __kmalloc
>    + __kmalloc
> -   0.78%  smbd  [kernel.kallsyms]         [k] ext4_htree_store_dirent
>    + ext4_htree_store_dirent
>    + 0x7ffab9f4f2f5
> -   0.73%  smbd  [kernel.kallsyms]         [k] kmem_cache_free
>    + kmem_cache_free
> -   0.73%  smbd  [kernel.kallsyms]         [k] link_path_walk
>    + link_path_walk
> -   0.69%  smbd  libc-2.13.so              [.] malloc
>    + malloc
> -   0.69%  smbd  libtalloc.so.2.0.7        [.] _talloc_zero
>    + _talloc_zero
> -   0.62%  smbd  [kernel.kallsyms]         [k] fcntl_setlk
>    + fcntl_setlk
>    + 0x7ffabcf93238
> -   0.59%  smbd  [kernel.kallsyms]         [k] __d_lookup_rcu
>    + __d_lookup_rcu
> -   0.57%  smbd  libtalloc.so.2.0.7        [.] talloc_alloc_pool
>    + talloc_alloc_pool
> -   0.55%  smbd  libtalloc.so.2.0.7        [.] talloc_get_name
>    + talloc_get_name
> -   0.55%  smbd  [kernel.kallsyms]         [k] __posix_lock_file
>    + __posix_lock_file
>    + 0x7ffabcf93238
> -   0.50%  smbd  [kernel.kallsyms]         [k] _raw_spin_lock
>    + _raw_spin_lock
> +   0.49%  smbd  [kernel.kallsyms]         [k] tg3_start_xmit
> +   0.48%  smbd  [kernel.kallsyms]         [k] system_call_after_swapgs
> +   0.46%  smbd  libtalloc.so.2.0.7        [.] talloc_named_const
> +   0.46%  smbd  [kernel.kallsyms]         [k] memset
> +   0.46%  smbd  libtalloc.so.2.0.7        [.] _talloc_get_type_abort
>  +   0.45%  smbd  [kernel.kallsyms]         [k] str2hashbuf_signed
> +   0.45%  smbd  [kernel.kallsyms]         [k] kfree
> +   0.45%  smbd  libc-2.13.so              [.] free
> +   0.44%  smbd  [kernel.kallsyms]         [k] __alloc_skb
> +   0.42%  smbd  libtalloc.so.2.0.7        [.] talloc_is_parent
> +   0.41%  smbd  libtalloc.so.2.0.7        [.] _talloc_array
>
>
>
>
>
> On Mon, Sep 30, 2013 at 5:39 PM, Jeremy Allison <jra at samba.org> wrote:
>
>> On Mon, Sep 30, 2013 at 05:21:44PM -0300, Thiago Fernandes Crepaldi wrote:
>> > Andrew, in my company we are also experiencing a higher CPU usage of
>> Samba
>> > 4 (smbd) if compared to Samba 3.
>> >
>> > In fact, it almost reaches 100% of CPU and uses all the memory during
>> *dir
>> > copies* (individual file copy is as good as samba 3's). I strongly
>> believe
>> > that this CPU usage is the responsible for a worse samba 4's throughput
>> if
>> > compared to Samba 3 tests.
>> >
>> > Giving that, I would like to contribute with this investigation and
>> share
>> > my data regarding perf profiling on smbd (parent process)
>> >
>> > Events: 7  cycles
>> > -  90.01%  smbd  [kernel.kallsyms]  [k] copy_pte_range
>> >      copy_pte_range
>> >      __libc_fork
>> >      smbd_accept_connection
>> > -   9.77%  smbd  [kernel.kallsyms]  [k] handle_edge_irq
>> >      handle_edge_irq
>> >      smbd_accept_connection
>> > -   0.22%  smbd  [kernel.kallsyms]  [k] perf_pmu_rotate_start.isra.57
>> >      perf_pmu_rotate_start.isra.57
>> >      __poll
>> > -   0.00%  smbd  [kernel.kallsyms]  [k] native_write_msr_safe
>> >      native_write_msr_safe
>> >      __poll
>>
>> It's the client process that should have the interesting
>> profile data, the parent is just going to sit there doing
>> accept().
>>
>> Jeremy.
>>
>
>
>
> --
> Thiago Crepaldi
>



-- 
Thiago Crepaldi


More information about the samba mailing list