Winbind using 100% CPU
Jim McDonough
jmcd at samba.org
Wed Apr 17 06:26:11 MDT 2013
On Wed, Apr 17, 2013 at 2:02 AM, Andreas Schneider <asn at samba.org> wrote:
> On Tuesday 16 April 2013 15:53:35 Jim McDonough wrote:
>> On Tue, Apr 16, 2013 at 2:45 PM, Jeremy Allison <jra at samba.org> wrote:
>> > On Tue, Apr 16, 2013 at 01:46:14PM -0400, Jim McDonough wrote:
>> >> On Tue, Apr 16, 2013 at 10:00 AM, Jim McDonough <jmcd at samba.org> wrote:
>> >> > On Thu, Apr 11, 2013 at 6:59 PM, Jeremy Allison <jra at samba.org> wrote:
>> >> >> On Thu, Apr 11, 2013 at 09:58:33AM -0400, Dylan Klomparens wrote:
>> >> >>> (Re-posting on this email list per Jeremy Allison's request.)
>> >> >>>
>> >> >>> I am trying to figure out why winbind is using 100% CPU on my file
>> >> >>> server.
>> >> >>> I am using Samba version 4.0.4. Everything is fine for a few minutes
>> >> >>> when I
>> >> >>> start winbind, however after a while it begins using 100% CPU. I
>> >> >>> haven't
>> >> >>> been able to narrow down what triggers this CPU usage spike, but I
>> >> >>> did
>> >> >>> attach the GNU debugger to find out what's going on in the process.
>> >> >>> The
>> >> >>> backtrace revealed this information:
>> >> >>>
>> >> >>> #0 0x000000000041cf30 in _talloc_free at plt ()
>> >> >>> #1 0x0000000000452320 in winbindd_reinit_after_fork ()
>> >> >>> #2 0x00000000004524e6 in fork_domain_child ()
>> >> >>> #3 0x0000000000453585 in wb_child_request_trigger ()
>> >> >>> #4 0x000000381d2048e2 in tevent_common_loop_immediate () from
>> >> >>> /lib64/libtevent.so.0
>> >> >>> #5 0x00007fbed6b98e17 in run_events_poll () from
>> >> >>> /lib64/libsmbconf.so.0
>> >> >>> #6 0x00007fbed6b9922e in s3_event_loop_once () from
>> >> >>> /lib64/libsmbconf.so.0
>> >> >>> #7 0x000000381d204060 in _tevent_loop_once () from
>> >> >>> /lib64/libtevent.so.0
>> >> >>> #8 0x000000000042049a in main ()
>> >> >>>
>> >> >>> Apparently it's stuck in the winbindd_reinit_after_fork (and more
>> >> >>> specifically the _talloc_free function). This code resides in
>> >> >>> $SOURCE_HOME\source3\winbindd\winbindd_dual.c.
>> >> >>
>> >> >> That looks like corrupted memory - probably a loop
>> >> >> in the talloc tree.
>> >> >
>> >> > I've got a user who sees this and we're adding the same dlinklist
>> >> > element twice, creating a loop in the winbind child list.
>> >> >
>> >> > I've got a broken wrist so responses take a while, but that's my
>> >> > current hint. On 3.6.3 and 3.6.13.
>> >>
>> >> I see this in the parent winbind log. The last 3 entries are changes
>> >> I made to the dlist macros within winbind only. you can see a 2
>> >> second delay and then a second add of the same item to the child
>> >> winbind list. No entries in between (and a production system so there
>> >> is reluctance to increase the debug level).
>> >>
>> >> [2013/03/14 09:08:57.795585, 3]
>> >> winbindd/winbindd_getpwnam.c:56(winbindd_getpwnam_send)
>> >>
>> >> getpwnam xx+xxxxxxxxx-112$
>> >>
>> >> [2013/03/14 09:08:57.796185, 3]
>> >> winbindd/winbindd_getpwnam.c:56(winbindd_getpwnam_send)
>> >>
>> >> getpwnam xx+yyyyyyy
>> >>
>> >> [2013/03/14 09:09:00.077660, 0]
>> >> winbindd/winbindd_dual.c:1399(fork_domain_child)
>> >>
>> >> adding 0x7ff91105a000 to list at 0x7ff911053510
>> >>
>> >> [2013/03/14 09:09:02.435367, 0]
>> >> winbindd/winbindd_dual.c:1399(fork_domain_child)
>> >>
>> >> adding 0x7ff91105a000 to list at 0x7ff91105a000
>> >>
>> >> [2013/03/14 09:09:02.435510, 0] lib/util.c:1117(smb_panic)
>> >>
>> >> PANIC (pid 28628): duplicate
>> >
>> > Jim, do your changes lead to a smb_panic on an
>> > DLIST_ADD of a duplicate ? If so, can you post
>> > a complete backtrace ?
>> >
>> > Jeremy.
>>
>> [2013/03/14 09:09:02.437394, 0] lib/util.c:1221(log_stack_trace)
>> BACKTRACE: 11 stack frames:
>> #0 /usr/sbin/winbindd(log_stack_trace+0x1a) [0x7f02379883ba]
>> #1 /usr/sbin/winbindd(smb_panic+0x2b) [0x7f023798848b]
>> #2 /usr/sbin/winbindd(+0x10806e) [0x7f02378d206e]
>> #3 /usr/sbin/winbindd(+0x1084b5) [0x7f02378d24b5]
>> #4 /usr/sbin/winbindd(tevent_common_loop_immediate+0xea)
>> [0x7f023799917a] #5 /usr/sbin/winbindd(run_events_poll+0x3b)
>> [0x7f02379973db]
>> #6 /usr/sbin/winbindd(+0x1cdb12) [0x7f0237997b12]
>> #7 /usr/sbin/winbindd(_tevent_loop_once+0x90) [0x7f0237997f10]
>> #8 /usr/sbin/winbindd(main+0x7db) [0x7f02378aa13b]
>> #9 /lib64/libc.so.6(__libc_start_main+0xe6) [0x7f0234a2ec36]
>> #10 /usr/sbin/winbindd(+0xdd839) [0x7f02378a7839]
>>
>>
>> Unfortunately their core file doesn't seem quite right...this was out
>> of their logs and I'm having trouble with the core, but this fscking
>> splint keeps me moving at a snail's pace
>
> Jim, can you port this patch to master and ask Dylan to run with the patch so
> we get a full backtrace?
Here it is on master...
--
Jim McDonough
Samba Team
SUSE labs
jmcd at samba dot org
jmcd at themcdonoughs dot org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Panic-on-adding-a-duplicate-dlist-item-in-winbind.patch
Type: application/octet-stream
Size: 2510 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20130417/d3fb32b2/attachment.obj>
More information about the samba-technical
mailing list