Winbindd using 100% of CPU. Any solution?

Richard Sharpe realrichardsharpe at gmail.com
Wed Dec 4 12:11:46 MST 2013


On Wed, Dec 4, 2013 at 10:49 AM, Richard Sharpe
<realrichardsharpe at gmail.com> wrote:
> On Thu, Nov 21, 2013 at 12:09 AM, Andreas Schneider <asn at samba.org> wrote:
>> On Tuesday 19 November 2013 16:55:43 Jeremy Allison wrote:
>>> On Tue, Nov 19, 2013 at 04:50:16PM -0800, Richard Sharpe wrote:
>>> > On Tue, Nov 19, 2013 at 3:35 PM, Jeremy Allison <jra at samba.org> wrote:
>>> > > On Tue, Nov 19, 2013 at 03:31:55PM -0800, Richard Sharpe wrote:
>>> > >> Hi folks,
>>> > >>
>>> > >> We are seeing the same problem as in
>>> > >> http://samba.2283325.n4.nabble.com/Winbind-using-100-CPU-td4646572.html
>>> > >>
>>> > >> This is with Samba 3.6.12+ under FreeBSD.
>>> > >>
>>> > >> Does anyone have a solution?
>>> > >
>>> > > Ask Jim for his DLIST_ macro changes.
>>> >
>>> > There were some changes there already.
>>> >
>>> > Rather than panic, would it be reasonable to simply refuse to add the
>>> > duplicate entry and log more info and dump the stack?
>>>
>>> Whatever you need to help track it down. But IMHO panic == dump the stack.
>>
>> Maybe also get a
>>
>> talloc_report_full(0, fopen("/tmp/talloc_report.log","w"))
>>
>> which often gives you a hint what's going on.
>
> Well, it actually seems to be a different problem because my core-dump
> patch did not hit, as far as I can see.
>
> Here is the traceback:
>
> (gdb) where
>
> #0  0x00000000004bc8c7 in winbindd_reinit_after_fork
> (myself=0x80334cdc0, logfilename=0x8033d7b80
> "/var/log/samba/log.wb-XCHANGE")
>
>     at winbindd/winbindd_dual.c:1244
>
> #1  0x00000000004bce5a in fork_domain_child (child=0x80334cdc0) at
> winbindd/winbindd_dual.c:1362
>
> #2  0x00000000004b9273 in wb_child_request_trigger (req=0x803387d50,
> private_data=0x0) at winbindd/winbindd_dual.c:145
>
> #3  0x00000000005bd8c9 in tevent_queue_immediate_trigger
> (ev=0x80331e110, im=0x803387b10, private_data=0x8033d7b50) at
> ../lib/tevent/tevent_queue.c:144
>
> #4  0x00000000005bbcbf in tevent_common_loop_immediate
> (ev=0x80331e110) at ../lib/tevent/tevent_immediate.c:139
>
> #5  0x00000000005b8852 in run_events_poll (ev=0x80331e110, pollrtn=0,
> pfds=0x0, num_pfds=0) at lib/events.c:197
>
> #6  0x00000000005b90ab in s3_event_loop_once (ev=0x80331e110,
> location=0xb51e84 "winbindd/winbindd.c:1456") at lib/events.c:331
>
> #7  0x00000000005ba31f in _tevent_loop_once (ev=0x80331e110,
> location=0xb51e84 "winbindd/winbindd.c:1456") at
> ../lib/tevent/tevent.c:494
>
> #8  0x000000000048a585 in main (argc=3, argv=0x7fffffffecb0,
> envp=0x7fffffffecd0) at winbindd/winbindd.c:1456

Those line numbers seem messed up. I caught it in talloc_free, so it
looks like we are in a loop here:

        for (domain = domain_list(); domain; domain = domain->next) {
                TALLOC_FREE(domain->check_online_event);
        }

Here is what the list of domain looks like and the prev pointers are
seriously messed up:

(gdb) p *domain_list()
$3 = {name = "BUILTIN", '\000' <repeats 248 times>,
  alt_name = '\000' <repeats 255 times>, forest_name = '\000' <repeats
255 times>,
  sid = {sid_rev_num = 1 '\001', num_auths = 1 '\001',
    id_auth = "\000\000\000\000\000\005", sub_auths = {32, 0 <repeats
14 times>}},
  domain_flags = 0, domain_type = 0, domain_trust_attribs = 0,
initialized = false,
  native_mode = false, active_directory = false, primary = false,
internal = true,
  online = true, startup_time = 0, startup = false, can_do_samlogon_ex = false,
  can_do_ncacn_ip_tcp = false, can_do_validation6 = false, methods = 0xe08840,
  backend = 0x0, private_data = 0x0, have_idmap_config = false,
id_range_low = 0,
  id_range_high = 0, dc_probe_pid = -1, dcname = '\000' <repeats 255
times>, dcaddr = {
    ss_len = 0 '\000', ss_family = 0 '\000', __ss_pad1 = "\000\000\000\000\000",
    __ss_align = 0, __ss_pad2 = '\000' <repeats 111 times>}, last_seq_check = 0,
  sequence_number = 4294967295, last_status = {v = 0}, conn = {cli = 0x0,
    samr_pipe = 0x0, sam_connect_handle = {handle_type = 0, uuid =
{time_low = 0,
        time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, sam_domain_handle =
{handle_type = 0, uuid = {
        time_low = 0, time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, lsa_pipe = 0x0, lsa_pipe_tcp = 0x0,
    lsa_policy = {handle_type = 0, uuid = {time_low = 0, time_mid = 0,
        time_hi_and_version = 0, clock_seq = "\000", node =
"\000\000\000\000\000"}},
    netlogon_pipe = 0x0}, children = 0x803358700, check_online_timeout = 0,
  check_online_event = 0x0, prev = 0x80334a200, next = 0x803343c00}
(gdb) p *(domain_list()->next)
$4 = {name = "PCC1-GB1", '\000' <repeats 247 times>,
  alt_name = '\000' <repeats 255 times>, forest_name = '\000' <repeats
255 times>,
  sid = {sid_rev_num = 1 '\001', num_auths = 4 '\004',
    id_auth = "\000\000\000\000\000\005", sub_auths = {21, 1013778237,
1260021253,
      3914778424, 0 <repeats 11 times>}}, domain_flags = 0, domain_type = 0,
  domain_trust_attribs = 0, initialized = false, native_mode = false,
  active_directory = false, primary = false, internal = true, online = true,
  startup_time = 0, startup = false, can_do_samlogon_ex = false,
  can_do_ncacn_ip_tcp = false, can_do_validation6 = false, methods = 0xe08840,
  backend = 0x0, private_data = 0x0, have_idmap_config = false,
id_range_low = 0,
  id_range_high = 0, dc_probe_pid = -1, dcname = '\000' <repeats 255
times>, dcaddr = {
    ss_len = 0 '\000', ss_family = 0 '\000', __ss_pad1 = "\000\000\000\000\000",
    __ss_align = 0, __ss_pad2 = '\000' <repeats 111 times>}, last_seq_check = 0,
  sequence_number = 4294967295, last_status = {v = 0}, conn = {cli = 0x0,
    samr_pipe = 0x0, sam_connect_handle = {handle_type = 0, uuid =
{time_low = 0,
        time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, sam_domain_handle =
{handle_type = 0, uuid = {
        time_low = 0, time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, lsa_pipe = 0x0, lsa_pipe_tcp = 0x0,
    lsa_policy = {handle_type = 0, uuid = {time_low = 0, time_mid = 0,
        time_hi_and_version = 0, clock_seq = "\000", node =
"\000\000\000\000\000"}},
    netlogon_pipe = 0x0}, children = 0x803358820, check_online_timeout = 0,
  check_online_event = 0x0, prev = 0x803343600, next = 0x803344200}
(gdb) p *(domain_list()->next->next)
$5 = {name = "NIST", '\000' <repeats 251 times>,
  alt_name = "campus.nist.gov", '\000' <repeats 240 times>,
  forest_name = "NIST.GOV", '\000' <repeats 247 times>, sid =
{sid_rev_num = 1 '\001',
    num_auths = 4 '\004', id_auth = "\000\000\000\000\000\005", sub_auths = {21,
      1908027396, 2059629336, 315576832, 0 <repeats 11 times>}},
domain_flags = 0,
  domain_type = 0, domain_trust_attribs = 0, initialized = true,
native_mode = true,
  active_directory = true, primary = true, internal = false, online = true,
  startup_time = 56, startup = false, can_do_samlogon_ex = false,
  can_do_ncacn_ip_tcp = true, can_do_validation6 = true, methods = 0xe08840,
  backend = 0x0, private_data = 0x0, have_idmap_config = false,
id_range_low = 0,
  id_range_high = 0, dc_probe_pid = 3113,
  dcname = "WS019.campus.NIST.GOV", '\000' <repeats 234 times>, dcaddr = {
    ss_len = 16 '\020', ss_family = 2 '\002', __ss_pad1 =
"\000\000\201\006\020O",
    __ss_align = 0, __ss_pad2 = '\000' <repeats 111 times>}, last_seq_check = 0,
  sequence_number = 4294967295, last_status = {v = 0}, conn = {cli = 0x0,
    samr_pipe = 0x0, sam_connect_handle = {handle_type = 0, uuid =
{time_low = 0,
        time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, sam_domain_handle =
{handle_type = 0, uuid = {
        time_low = 0, time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, lsa_pipe = 0x0, lsa_pipe_tcp = 0x0,
    lsa_policy = {handle_type = 0, uuid = {time_low = 0, time_mid = 0,
        time_hi_and_version = 0, clock_seq = "\000", node =
"\000\000\000\000\000"}},
    netlogon_pipe = 0x0}, children = 0x8033589a0, check_online_timeout = 0,
  check_online_event = 0x0, prev = 0x803343c00, next = 0x803344800}
(gdb) p *(domain_list()->next->next->next)
$6 = {name = "NISTROOT", '\000' <repeats 247 times>,
  alt_name = "NIST.GOV", '\000' <repeats 247 times>,
  forest_name = '\000' <repeats 255 times>, sid = {sid_rev_num = 1 '\001',
    num_auths = 4 '\004', id_auth = "\000\000\000\000\000\005", sub_auths = {21,
      1844237615, 926492609, 1801674531, 0 <repeats 11 times>}},
domain_flags = 39,
  domain_type = 2, domain_trust_attribs = 4194304, initialized = true,
  native_mode = true, active_directory = true, primary = false,
internal = false,
  online = false, startup_time = 0, startup = false, can_do_samlogon_ex = false,
  can_do_ncacn_ip_tcp = false, can_do_validation6 = false, methods = 0xe08840,
  backend = 0x0, private_data = 0x0, have_idmap_config = false,
id_range_low = 0,
  id_range_high = 0, dc_probe_pid = -1, dcname = '\000' <repeats 255
times>, dcaddr = {
    ss_len = 0 '\000', ss_family = 0 '\000', __ss_pad1 = "\000\000\000\000\000",
    __ss_align = 0, __ss_pad2 = '\000' <repeats 111 times>}, last_seq_check = 0,
  sequence_number = 4294967295, last_status = {v = 0}, conn = {cli = 0x0,
    samr_pipe = 0x0, sam_connect_handle = {handle_type = 0, uuid =
{time_low = 0,
        time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, sam_domain_handle =
{handle_type = 0, uuid = {
        time_low = 0, time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, lsa_pipe = 0x0, lsa_pipe_tcp = 0x0,
    lsa_policy = {handle_type = 0, uuid = {time_low = 0, time_mid = 0,
        time_hi_and_version = 0, clock_seq = "\000", node =
"\000\000\000\000\000"}},
    netlogon_pipe = 0x0}, children = 0x803393b80, check_online_timeout = 0,
  check_online_event = 0x0, prev = 0x803344200, next = 0x803344e00}
(gdb) p *(domain_list()->next->next->next->next)
$7 = {name = "VISTAPILOT", '\000' <repeats 245 times>,
  alt_name = "vista.nist.gov", '\000' <repeats 241 times>,
  forest_name = '\000' <repeats 255 times>, sid = {sid_rev_num = 1 '\001',
    num_auths = 4 '\004', id_auth = "\000\000\000\000\000\005", sub_auths = {21,
      2527356630, 955619708, 1687138365, 0 <repeats 11 times>}},
domain_flags = 0,
  domain_type = 0, domain_trust_attribs = 0, initialized = true,
native_mode = false,
  active_directory = false, primary = false, internal = false, online = false,
  startup_time = 0, startup = false, can_do_samlogon_ex = false,
  can_do_ncacn_ip_tcp = false, can_do_validation6 = false, methods = 0xe08840,
  backend = 0x0, private_data = 0x0, have_idmap_config = false,
id_range_low = 0,
  id_range_high = 0, dc_probe_pid = -1, dcname = '\000' <repeats 255
times>, dcaddr = {
    ss_len = 0 '\000', ss_family = 0 '\000', __ss_pad1 = "\000\000\000\000\000",
    __ss_align = 0, __ss_pad2 = '\000' <repeats 111 times>}, last_seq_check = 0,
  sequence_number = 4294967295, last_status = {v = 0}, conn = {cli = 0x0,
    samr_pipe = 0x0, sam_connect_handle = {handle_type = 0, uuid =
{time_low = 0,
        time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, sam_domain_handle =
{handle_type = 0, uuid = {
        time_low = 0, time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, lsa_pipe = 0x0, lsa_pipe_tcp = 0x0,
    lsa_policy = {handle_type = 0, uuid = {time_low = 0, time_mid = 0,
        time_hi_and_version = 0, clock_seq = "\000", node =
"\000\000\000\000\000"}},
    netlogon_pipe = 0x0}, children = 0x80330b100, check_online_timeout = 0,
  check_online_event = 0x0, prev = 0x803344800, next = 0x803345400}
(gdb) p *(domain_list()->next->next->next->next->next)
$8 = {name = "XCHANGE", '\000' <repeats 248 times>,
  alt_name = "xchange.nist.gov", '\000' <repeats 239 times>,
  forest_name = '\000' <repeats 255 times>, sid = {sid_rev_num = 1 '\001',
    num_auths = 4 '\004', id_auth = "\000\000\000\000\000\005", sub_auths = {21,
      782252399, 1160315966, 1364796038, 0 <repeats 11 times>}},
domain_flags = 0,
  domain_type = 0, domain_trust_attribs = 0, initialized = true,
native_mode = true,
  active_directory = true, primary = false, internal = false, online = false,
  startup_time = 0, startup = false, can_do_samlogon_ex = false,
  can_do_ncacn_ip_tcp = false, can_do_validation6 = false, methods = 0xe08840,
  backend = 0x0, private_data = 0x0, have_idmap_config = false,
id_range_low = 0,
  id_range_high = 0, dc_probe_pid = -1, dcname = '\000' <repeats 255
times>, dcaddr = {
    ss_len = 0 '\000', ss_family = 0 '\000', __ss_pad1 = "\000\000\000\000\000",
    __ss_align = 0, __ss_pad2 = '\000' <repeats 111 times>}, last_seq_check = 0,
  sequence_number = 4294967295, last_status = {v = 0}, conn = {cli = 0x0,
    samr_pipe = 0x0, sam_connect_handle = {handle_type = 0, uuid =
{time_low = 0,
        time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, sam_domain_handle =
{handle_type = 0, uuid = {
        time_low = 0, time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, lsa_pipe = 0x0, lsa_pipe_tcp = 0x0,
    lsa_policy = {handle_type = 0, uuid = {time_low = 0, time_mid = 0,
        time_hi_and_version = 0, clock_seq = "\000", node =
"\000\000\000\000\000"}},
    netlogon_pipe = 0x0}, children = 0x803358940, check_online_timeout = 0,
  check_online_event = 0x0, prev = 0x803344e00, next = 0x803345a00}
(gdb) p *(domain_list()->next->next->next->next->next->next)
$9 = {name = "REMOTE", '\000' <repeats 249 times>,
  alt_name = "remote.nist.gov", '\000' <repeats 240 times>,
  forest_name = '\000' <repeats 255 times>, sid = {sid_rev_num = 1 '\001',
    num_auths = 4 '\004', id_auth = "\000\000\000\000\000\005", sub_auths = {21,
      1584364374, 263007247, 1256740096, 0 <repeats 11 times>}},
domain_flags = 0,
  domain_type = 0, domain_trust_attribs = 0, initialized = false,
native_mode = false,
  active_directory = false, primary = false, internal = false, online = false,
  startup_time = 0, startup = false, can_do_samlogon_ex = false,
  can_do_ncacn_ip_tcp = false, can_do_validation6 = false, methods = 0xe08840,
  backend = 0x0, private_data = 0x0, have_idmap_config = false,
id_range_low = 0,
  id_range_high = 0, dc_probe_pid = -1, dcname = '\000' <repeats 255
times>, dcaddr = {
    ss_len = 0 '\000', ss_family = 0 '\000', __ss_pad1 = "\000\000\000\000\000",
    __ss_align = 0, __ss_pad2 = '\000' <repeats 111 times>}, last_seq_check = 0,
  sequence_number = 4294967295, last_status = {v = 0}, conn = {cli = 0x0,
    samr_pipe = 0x0, sam_connect_handle = {handle_type = 0, uuid =
{time_low = 0,
        time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, sam_domain_handle =
{handle_type = 0, uuid = {
        time_low = 0, time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, lsa_pipe = 0x0, lsa_pipe_tcp = 0x0,
    lsa_policy = {handle_type = 0, uuid = {time_low = 0, time_mid = 0,
        time_hi_and_version = 0, clock_seq = "\000", node =
"\000\000\000\000\000"}},
    netlogon_pipe = 0x0}, children = 0x803358d60, check_online_timeout = 0,
  check_online_event = 0x0, prev = 0x803345400, next = 0x803346000}
(gdb) p *(domain_list()->next->next->next->next->next->next->next)
$10 = {name = "OIAA", '\000' <repeats 251 times>,
  alt_name = '\000' <repeats 255 times>, forest_name = '\000' <repeats
255 times>,
  sid = {sid_rev_num = 1 '\001', num_auths = 1 '\001',
    id_auth = "\000\000\000\000\000", sub_auths = {0 <repeats 15 times>}},
  domain_flags = 0, domain_type = 0, domain_trust_attribs = 0,
initialized = false,
  native_mode = false, active_directory = false, primary = false,
internal = false,
  online = false, startup_time = 0, startup = false, can_do_samlogon_ex = false,
  can_do_ncacn_ip_tcp = false, can_do_validation6 = false, methods = 0xe08840,
  backend = 0x0, private_data = 0x0, have_idmap_config = false,
id_range_low = 0,
  id_range_high = 0, dc_probe_pid = -1, dcname = '\000' <repeats 255
times>, dcaddr = {
    ss_len = 0 '\000', ss_family = 0 '\000', __ss_pad1 = "\000\000\000\000\000",
    __ss_align = 0, __ss_pad2 = '\000' <repeats 111 times>}, last_seq_check = 0,
  sequence_number = 4294967295, last_status = {v = 0}, conn = {cli = 0x0,
    samr_pipe = 0x0, sam_connect_handle = {handle_type = 0, uuid =
{time_low = 0,
        time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, sam_domain_handle =
{handle_type = 0, uuid = {
        time_low = 0, time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, lsa_pipe = 0x0, lsa_pipe_tcp = 0x0,
    lsa_policy = {handle_type = 0, uuid = {time_low = 0, time_mid = 0,
        time_hi_and_version = 0, clock_seq = "\000", node =
"\000\000\000\000\000"}},
    netlogon_pipe = 0x0}, children = 0x803358e80, check_online_timeout = 0,
  check_online_event = 0x0, prev = 0x803345a00, next = 0x803346600}
(gdb) p *(domain_list()->next->next->next->next->next->next->next->next)
$11 = {name = "SAC", '\000' <repeats 252 times>,
  alt_name = "SAC.gov", '\000' <repeats 248 times>,
  forest_name = '\000' <repeats 255 times>, sid = {sid_rev_num = 1 '\001',
    num_auths = 4 '\004', id_auth = "\000\000\000\000\000\005", sub_auths = {21,
      507921405, 1343024091, 1708537768, 0 <repeats 11 times>}},
domain_flags = 0,
  domain_type = 0, domain_trust_attribs = 0, initialized = false,
native_mode = false,
  active_directory = false, primary = false, internal = false, online = false,
  startup_time = 0, startup = false, can_do_samlogon_ex = false,
  can_do_ncacn_ip_tcp = false, can_do_validation6 = false, methods = 0xe08840,
  backend = 0x0, private_data = 0x0, have_idmap_config = false,
id_range_low = 0,
  id_range_high = 0, dc_probe_pid = -1, dcname = '\000' <repeats 255
times>, dcaddr = {
    ss_len = 0 '\000', ss_family = 0 '\000', __ss_pad1 = "\000\000\000\000\000",
    __ss_align = 0, __ss_pad2 = '\000' <repeats 111 times>}, last_seq_check = 0,
  sequence_number = 4294967295, last_status = {v = 0}, conn = {cli = 0x0,
    samr_pipe = 0x0, sam_connect_handle = {handle_type = 0, uuid =
{time_low = 0,
        time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, sam_domain_handle =
{handle_type = 0, uuid = {
        time_low = 0, time_mid = 0, time_hi_and_version = 0, clock_seq = "\000",
        node = "\000\000\000\000\000"}}, lsa_pipe = 0x0, lsa_pipe_tcp = 0x0,
    lsa_policy = {handle_type = 0, uuid = {time_low = 0, time_mid = 0,
        time_hi_and_version = 0, clock_seq = "\000", node =
"\000\000\000\000\000"}},
    netlogon_pipe = 0x0}, children = 0x803373c40, check_online_timeout = 0,
  check_online_event = 0x0, prev = 0x803346000, next = 0x803346c00}



-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)


More information about the samba-technical mailing list