Winbind using 100% CPU

Dylan Klomparens dylan.klomparens at gmail.com
Thu Apr 18 10:16:39 MDT 2013


The patch has exposed the problem. I've attached the output from winbind.
Please let me know if you need additional information!

-- Dylan


On Wed, Apr 17, 2013 at 8:26 AM, Jim McDonough <jmcd at samba.org> wrote:

> On Wed, Apr 17, 2013 at 2:02 AM, Andreas Schneider <asn at samba.org> wrote:
> > On Tuesday 16 April 2013 15:53:35 Jim McDonough wrote:
> >> On Tue, Apr 16, 2013 at 2:45 PM, Jeremy Allison <jra at samba.org> wrote:
> >> > On Tue, Apr 16, 2013 at 01:46:14PM -0400, Jim McDonough wrote:
> >> >> On Tue, Apr 16, 2013 at 10:00 AM, Jim McDonough <jmcd at samba.org>
> wrote:
> >> >> > On Thu, Apr 11, 2013 at 6:59 PM, Jeremy Allison <jra at samba.org>
> wrote:
> >> >> >> On Thu, Apr 11, 2013 at 09:58:33AM -0400, Dylan Klomparens wrote:
> >> >> >>> (Re-posting on this email list per Jeremy Allison's request.)
> >> >> >>>
> >> >> >>> I am trying to figure out why winbind is using 100% CPU on my
> file
> >> >> >>> server.
> >> >> >>> I am using Samba version 4.0.4. Everything is fine for a few
> minutes
> >> >> >>> when I
> >> >> >>> start winbind, however after a while it begins using 100% CPU. I
> >> >> >>> haven't
> >> >> >>> been able to narrow down what triggers this CPU usage spike, but
> I
> >> >> >>> did
> >> >> >>> attach the GNU debugger to find out what's going on in the
> process.
> >> >> >>> The
> >> >> >>> backtrace revealed this information:
> >> >> >>>
> >> >> >>> #0  0x000000000041cf30 in _talloc_free at plt ()
> >> >> >>> #1  0x0000000000452320 in winbindd_reinit_after_fork ()
> >> >> >>> #2  0x00000000004524e6 in fork_domain_child ()
> >> >> >>> #3  0x0000000000453585 in wb_child_request_trigger ()
> >> >> >>> #4  0x000000381d2048e2 in tevent_common_loop_immediate () from
> >> >> >>> /lib64/libtevent.so.0
> >> >> >>> #5  0x00007fbed6b98e17 in run_events_poll () from
> >> >> >>> /lib64/libsmbconf.so.0
> >> >> >>> #6  0x00007fbed6b9922e in s3_event_loop_once () from
> >> >> >>> /lib64/libsmbconf.so.0
> >> >> >>> #7  0x000000381d204060 in _tevent_loop_once () from
> >> >> >>> /lib64/libtevent.so.0
> >> >> >>> #8  0x000000000042049a in main ()
> >> >> >>>
> >> >> >>> Apparently it's stuck in the winbindd_reinit_after_fork (and more
> >> >> >>> specifically the _talloc_free function). This code resides in
> >> >> >>> $SOURCE_HOME\source3\winbindd\winbindd_dual.c.
> >> >> >>
> >> >> >> That looks like corrupted memory - probably a loop
> >> >> >> in the talloc tree.
> >> >> >
> >> >> > I've got a user who sees this and we're adding the same dlinklist
> >> >> > element twice, creating a loop in the winbind child list.
> >> >> >
> >> >> > I've got a broken wrist so responses take a while, but that's my
> >> >> > current hint.  On 3.6.3 and 3.6.13.
> >> >>
> >> >> I see this in the parent winbind log.  The last 3 entries are changes
> >> >> I made to the dlist macros within winbind only.  you can see a 2
> >> >> second delay and then a second add of the same item to the child
> >> >> winbind list.  No entries in between (and a production system so
> there
> >> >> is reluctance to increase the debug level).
> >> >>
> >> >> [2013/03/14 09:08:57.795585,  3]
> >> >> winbindd/winbindd_getpwnam.c:56(winbindd_getpwnam_send)
> >> >>
> >> >>   getpwnam xx+xxxxxxxxx-112$
> >> >>
> >> >> [2013/03/14 09:08:57.796185,  3]
> >> >> winbindd/winbindd_getpwnam.c:56(winbindd_getpwnam_send)
> >> >>
> >> >>   getpwnam xx+yyyyyyy
> >> >>
> >> >> [2013/03/14 09:09:00.077660,  0]
> >> >> winbindd/winbindd_dual.c:1399(fork_domain_child)
> >> >>
> >> >>   adding 0x7ff91105a000 to list at 0x7ff911053510
> >> >>
> >> >> [2013/03/14 09:09:02.435367,  0]
> >> >> winbindd/winbindd_dual.c:1399(fork_domain_child)
> >> >>
> >> >>   adding 0x7ff91105a000 to list at 0x7ff91105a000
> >> >>
> >> >> [2013/03/14 09:09:02.435510,  0] lib/util.c:1117(smb_panic)
> >> >>
> >> >>   PANIC (pid 28628): duplicate
> >> >
> >> > Jim, do your changes lead to a smb_panic on an
> >> > DLIST_ADD of a duplicate ? If so, can you post
> >> > a complete backtrace ?
> >> >
> >> > Jeremy.
> >>
> >>  [2013/03/14 09:09:02.437394,  0] lib/util.c:1221(log_stack_trace)
> >>    BACKTRACE: 11 stack frames:
> >>     #0 /usr/sbin/winbindd(log_stack_trace+0x1a) [0x7f02379883ba]
> >>     #1 /usr/sbin/winbindd(smb_panic+0x2b) [0x7f023798848b]
> >>     #2 /usr/sbin/winbindd(+0x10806e) [0x7f02378d206e]
> >>     #3 /usr/sbin/winbindd(+0x1084b5) [0x7f02378d24b5]
> >>     #4 /usr/sbin/winbindd(tevent_common_loop_immediate+0xea)
> >> [0x7f023799917a] #5 /usr/sbin/winbindd(run_events_poll+0x3b)
> >> [0x7f02379973db]
> >>     #6 /usr/sbin/winbindd(+0x1cdb12) [0x7f0237997b12]
> >>     #7 /usr/sbin/winbindd(_tevent_loop_once+0x90) [0x7f0237997f10]
> >>     #8 /usr/sbin/winbindd(main+0x7db) [0x7f02378aa13b]
> >>     #9 /lib64/libc.so.6(__libc_start_main+0xe6) [0x7f0234a2ec36]
> >>     #10 /usr/sbin/winbindd(+0xdd839) [0x7f02378a7839]
> >>
> >>
> >> Unfortunately their core file doesn't seem quite right...this was out
> >> of their logs and I'm having trouble with the core, but this fscking
> >> splint keeps me moving at a snail's pace
> >
> > Jim, can you port this patch to master and ask Dylan to run with the
> patch so
> > we get a full backtrace?
> Here it is on master...
>
>
>
>
> --
> Jim McDonough
> Samba Team
> SUSE labs
> jmcd at samba dot org
> jmcd at themcdonoughs dot org
>
-------------- next part --------------
[ 2514]: list trusted domains
ads: trusted_domains
[ 2514]: list trusted domains
ads: trusted_domains
adding 0x2642360 to list at 0x2638bf0
PANIC (pid 2514): duplicate!
BACKTRACE: 13 stack frames:
 #0 /usr/local/samba/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7fe3d3e607a4]
 #1 /usr/local/samba/lib/libsmbconf.so.0(smb_panic_s3+0x6c) [0x7fe3d3e605f7]
 #2 /usr/local/samba/lib/libsamba-util.so.0(smb_panic+0x28) [0x7fe3d92bbc65]
 #3 ./winbindd() [0x45da2e]
 #4 ./winbindd() [0x459c78]
 #5 /usr/local/samba/lib/private/libtevent.so.0(+0x5259) [0x7fe3d8a52259]
 #6 /usr/local/samba/lib/private/libtevent.so.0(tevent_common_loop_immediate+0x1f5) [0x7fe3d8a51ef8]
 #7 /usr/local/samba/lib/libsmbconf.so.0(run_events_poll+0x56) [0x7fe3d3e7c235]
 #8 /usr/local/samba/lib/libsmbconf.so.0(+0x418f9) [0x7fe3d3e7c8f9]
 #9 /usr/local/samba/lib/private/libtevent.so.0(_tevent_loop_once+0xf4) [0x7fe3d8a51032]
 #10 ./winbindd(main+0xbd9) [0x42b6bb]
 #11 /lib64/libc.so.6(__libc_start_main+0xf5) [0x345ac21a05]
 #12 ./winbindd() [0x41e329]
dumping core in /var/log/samba/cores/winbindd
Got sig[15] terminate (is_parent=0)
Got sig[15] terminate (is_parent=0)
Got sig[15] terminate (is_parent=0)
Got sig[15] terminate (is_parent=0)
Got sig[15] terminate (is_parent=0)
Got sig[15] terminate (is_parent=0)
Got sig[15] terminate (is_parent=0)
Got sig[15] terminate (is_parent=0)
Aborted (core dumped)


More information about the samba-technical mailing list