Winbindd using 100% of CPU. Any solution?
Jeremy Allison
jra at samba.org
Wed Dec 4 13:00:14 MST 2013
On Wed, Dec 04, 2013 at 11:49:49AM -0800, Richard Sharpe wrote:
> On Wed, Dec 4, 2013 at 11:27 AM, Jeremy Allison <jra at samba.org> wrote:
> > On Wed, Dec 04, 2013 at 11:11:46AM -0800, Richard Sharpe wrote:
> >>
> >> Those line numbers seem messed up. I caught it in talloc_free, so it
> >> looks like we are in a loop here:
> >>
> >> for (domain = domain_list(); domain; domain = domain->next) {
> >> TALLOC_FREE(domain->check_online_event);
> >> }
> >>
> >> Here is what the list of domain looks like and the prev pointers are
> >> seriously messed up:
> >
> > Yes, we know that. The problem is finding out *HOW*
> > the pointers got messed up :-(.
>
> OK, after fixing my line numbers, I now find that we are looping here:
>
> /* Destroy all possible events in child list. */
> for (cl = winbindd_children; cl != NULL; cl = cl->next) {
> TALLOC_FREE(cl->lockout_policy_event);
> TALLOC_FREE(cl->machine_password_change_event);
>
> /* Children should never be able to send
> * each other messages, all messages must
> * go through the parent.
> */
> cl->pid = (pid_t)0;
>
> /*
> * Close service sockets to all other children
> */
> if ((cl != myself) && (cl->sock != -1)) {
> close(cl->sock);
> cl->sock = -1;
> }
> }
>
> and the winbindd_children list is seriously screwed in a couple of ways:
>
> (gdb) p winbindd_children
> $22 = (struct winbindd_child *) 0x803358940
> (gdb) p *winbindd_children
> $23 = {next = 0xeac360, prev = 0x8033589a0, pid = 0, domain = 0x803345400,
> logfilename = 0x8033d7c80 "/var/log/samba/log.wb-XCHANGE", sock = -1,
> queue = 0x8033d7c50, binding_handle = 0x8033d7d50, lockout_policy_event = 0x0,
> machine_password_change_event = 0x0, table = 0xe09580}
> (gdb) p *(winbindd_children->next)
> $24 = {next = 0x803358940, prev = 0x803358940, pid = 0, domain = 0x0,
> logfilename = 0x803330300 "/var/log/samba/log.winbindd-idmap", sock = -1,
> queue = 0x8033302d0, binding_handle = 0x8033303d0, lockout_policy_event = 0x0,
> machine_password_change_event = 0x0, table = 0xe09680}
>
> The last element points back to itself, which is the cause of the
> infinite loop, but the first element has a weird value in its next
> pointer ...
Yes, it's almost certainly a memory overwrite problem,
but as it's writing onto valid memory it's really
difficult to find. Valgrind wouldn't flag it :-(.
Jeremy.
More information about the samba-technical
mailing list