Winbind using 100% CPU

Jim McDonough jmcd at samba.org
Tue Apr 16 11:46:14 MDT 2013


On Tue, Apr 16, 2013 at 10:00 AM, Jim McDonough <jmcd at samba.org> wrote:
> On Thu, Apr 11, 2013 at 6:59 PM, Jeremy Allison <jra at samba.org> wrote:
>> On Thu, Apr 11, 2013 at 09:58:33AM -0400, Dylan Klomparens wrote:
>>> (Re-posting on this email list per Jeremy Allison's request.)
>>>
>>> I am trying to figure out why winbind is using 100% CPU on my file server.
>>> I am using Samba version 4.0.4. Everything is fine for a few minutes when I
>>> start winbind, however after a while it begins using 100% CPU. I haven't
>>> been able to narrow down what triggers this CPU usage spike, but I did
>>> attach the GNU debugger to find out what's going on in the process. The
>>> backtrace revealed this information:
>>>
>>> #0  0x000000000041cf30 in _talloc_free at plt ()
>>> #1  0x0000000000452320 in winbindd_reinit_after_fork ()
>>> #2  0x00000000004524e6 in fork_domain_child ()
>>> #3  0x0000000000453585 in wb_child_request_trigger ()
>>> #4  0x000000381d2048e2 in tevent_common_loop_immediate () from
>>> /lib64/libtevent.so.0
>>> #5  0x00007fbed6b98e17 in run_events_poll () from /lib64/libsmbconf.so.0
>>> #6  0x00007fbed6b9922e in s3_event_loop_once () from /lib64/libsmbconf.so.0
>>> #7  0x000000381d204060 in _tevent_loop_once () from /lib64/libtevent.so.0
>>> #8  0x000000000042049a in main ()
>>>
>>> Apparently it's stuck in the winbindd_reinit_after_fork (and more
>>> specifically the _talloc_free function). This code resides in
>>> $SOURCE_HOME\source3\winbindd\winbindd_dual.c.
>>
>> That looks like corrupted memory - probably a loop
>> in the talloc tree.
> I've got a user who sees this and we're adding the same dlinklist
> element twice, creating a loop in the winbind child list.
>
> I've got a broken wrist so responses take a while, but that's my
> current hint.  On 3.6.3 and 3.6.13.
>

I see this in the parent winbind log.  The last 3 entries are changes
I made to the dlist macros within winbind only.  you can see a 2
second delay and then a second add of the same item to the child
winbind list.  No entries in between (and a production system so there
is reluctance to increase the debug level).

[2013/03/14 09:08:57.795585,  3]
winbindd/winbindd_getpwnam.c:56(winbindd_getpwnam_send)
  getpwnam xx+xxxxxxxxx-112$
[2013/03/14 09:08:57.796185,  3]
winbindd/winbindd_getpwnam.c:56(winbindd_getpwnam_send)
  getpwnam xx+yyyyyyy
[2013/03/14 09:09:00.077660,  0]
winbindd/winbindd_dual.c:1399(fork_domain_child)
  adding 0x7ff91105a000 to list at 0x7ff911053510
[2013/03/14 09:09:02.435367,  0]
winbindd/winbindd_dual.c:1399(fork_domain_child)
  adding 0x7ff91105a000 to list at 0x7ff91105a000
[2013/03/14 09:09:02.435510,  0] lib/util.c:1117(smb_panic)
  PANIC (pid 28628): duplicate
--
Jim McDonough
Samba Team
SUSE labs
jmcd at samba dot org
jmcd at themcdonoughs dot org


More information about the samba-technical mailing list