Winbindd using 100% of CPU. Any solution?

Richard Sharpe realrichardsharpe at gmail.com
Wed Dec 4 13:06:40 MST 2013


On Wed, Dec 4, 2013 at 12:00 PM, Jeremy Allison <jra at samba.org> wrote:
> On Wed, Dec 04, 2013 at 11:49:49AM -0800, Richard Sharpe wrote:
>> On Wed, Dec 4, 2013 at 11:27 AM, Jeremy Allison <jra at samba.org> wrote:
>> > On Wed, Dec 04, 2013 at 11:11:46AM -0800, Richard Sharpe wrote:
>> >>
>> >> Those line numbers seem messed up. I caught it in talloc_free, so it
>> >> looks like we are in a loop here:
>> >>
>> >>         for (domain = domain_list(); domain; domain = domain->next) {
>> >>                 TALLOC_FREE(domain->check_online_event);
>> >>         }
>> >>
>> >> Here is what the list of domain looks like and the prev pointers are
>> >> seriously messed up:
>> >
>> > Yes, we know that. The problem is finding out *HOW*
>> > the pointers got messed up :-(.
>>
>> OK, after fixing my line numbers, I now find that we are looping here:
>>
>>         /* Destroy all possible events in child list. */
>>         for (cl = winbindd_children; cl != NULL; cl = cl->next) {
>>                 TALLOC_FREE(cl->lockout_policy_event);
>>                 TALLOC_FREE(cl->machine_password_change_event);
>>
>>                 /* Children should never be able to send
>>                  * each other messages, all messages must
>>                  * go through the parent.
>>                  */
>>                 cl->pid = (pid_t)0;
>>
>>                 /*
>>                  * Close service sockets to all other children
>>                  */
>>                 if ((cl != myself) && (cl->sock != -1)) {
>>                         close(cl->sock);
>>                         cl->sock = -1;
>>                 }
>>         }
>>
>> and the winbindd_children list is seriously screwed in a couple of ways:
>>
>> (gdb) p winbindd_children
>> $22 = (struct winbindd_child *) 0x803358940
>> (gdb) p *winbindd_children
>> $23 = {next = 0xeac360, prev = 0x8033589a0, pid = 0, domain = 0x803345400,
>>   logfilename = 0x8033d7c80 "/var/log/samba/log.wb-XCHANGE", sock = -1,
>>   queue = 0x8033d7c50, binding_handle = 0x8033d7d50, lockout_policy_event = 0x0,
>>   machine_password_change_event = 0x0, table = 0xe09580}
>> (gdb) p *(winbindd_children->next)
>> $24 = {next = 0x803358940, prev = 0x803358940, pid = 0, domain = 0x0,
>>   logfilename = 0x803330300 "/var/log/samba/log.winbindd-idmap", sock = -1,
>>   queue = 0x8033302d0, binding_handle = 0x8033303d0, lockout_policy_event = 0x0,
>>   machine_password_change_event = 0x0, table = 0xe09680}
>>
>> The last element points back to itself, which is the cause of the
>> infinite loop, but the first element has a weird value in its next
>> pointer ...
>
> Yes, it's almost certainly a memory overwrite problem,
> but as it's writing onto valid memory it's really
> difficult to find. Valgrind wouldn't flag it :-(.

The build I got onto the customer system did not have the damn patch
to dump core when we hit that problem.

Trying again with a new build.

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)


More information about the samba-technical mailing list