Winbindd using 100% of CPU. Any solution?
Richard Sharpe
realrichardsharpe at gmail.com
Wed Dec 4 13:06:40 MST 2013
On Wed, Dec 4, 2013 at 12:00 PM, Jeremy Allison <jra at samba.org> wrote:
> On Wed, Dec 04, 2013 at 11:49:49AM -0800, Richard Sharpe wrote:
>> On Wed, Dec 4, 2013 at 11:27 AM, Jeremy Allison <jra at samba.org> wrote:
>> > On Wed, Dec 04, 2013 at 11:11:46AM -0800, Richard Sharpe wrote:
>> >>
>> >> Those line numbers seem messed up. I caught it in talloc_free, so it
>> >> looks like we are in a loop here:
>> >>
>> >> for (domain = domain_list(); domain; domain = domain->next) {
>> >> TALLOC_FREE(domain->check_online_event);
>> >> }
>> >>
>> >> Here is what the list of domain looks like and the prev pointers are
>> >> seriously messed up:
>> >
>> > Yes, we know that. The problem is finding out *HOW*
>> > the pointers got messed up :-(.
>>
>> OK, after fixing my line numbers, I now find that we are looping here:
>>
>> /* Destroy all possible events in child list. */
>> for (cl = winbindd_children; cl != NULL; cl = cl->next) {
>> TALLOC_FREE(cl->lockout_policy_event);
>> TALLOC_FREE(cl->machine_password_change_event);
>>
>> /* Children should never be able to send
>> * each other messages, all messages must
>> * go through the parent.
>> */
>> cl->pid = (pid_t)0;
>>
>> /*
>> * Close service sockets to all other children
>> */
>> if ((cl != myself) && (cl->sock != -1)) {
>> close(cl->sock);
>> cl->sock = -1;
>> }
>> }
>>
>> and the winbindd_children list is seriously screwed in a couple of ways:
>>
>> (gdb) p winbindd_children
>> $22 = (struct winbindd_child *) 0x803358940
>> (gdb) p *winbindd_children
>> $23 = {next = 0xeac360, prev = 0x8033589a0, pid = 0, domain = 0x803345400,
>> logfilename = 0x8033d7c80 "/var/log/samba/log.wb-XCHANGE", sock = -1,
>> queue = 0x8033d7c50, binding_handle = 0x8033d7d50, lockout_policy_event = 0x0,
>> machine_password_change_event = 0x0, table = 0xe09580}
>> (gdb) p *(winbindd_children->next)
>> $24 = {next = 0x803358940, prev = 0x803358940, pid = 0, domain = 0x0,
>> logfilename = 0x803330300 "/var/log/samba/log.winbindd-idmap", sock = -1,
>> queue = 0x8033302d0, binding_handle = 0x8033303d0, lockout_policy_event = 0x0,
>> machine_password_change_event = 0x0, table = 0xe09680}
>>
>> The last element points back to itself, which is the cause of the
>> infinite loop, but the first element has a weird value in its next
>> pointer ...
>
> Yes, it's almost certainly a memory overwrite problem,
> but as it's writing onto valid memory it's really
> difficult to find. Valgrind wouldn't flag it :-(.
The build I got onto the customer system did not have the damn patch
to dump core when we hit that problem.
Trying again with a new build.
--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
More information about the samba-technical
mailing list