winbindd stuck at getaddrinfo.

Volker Lendecke Volker.Lendecke at SerNet.DE
Wed Oct 29 04:41:36 MDT 2014


On Wed, Oct 29, 2014 at 12:20:53AM -0700, Hemanth Thummala wrote:
> We are using samba 3.6.12+ stack on FreeBSD 8.0. Recently we are hitting a
> winbindd hung issue at few of our customers. Each time we could see that
> winbindd getting hung at gettaddrinfo.
> 
> Here are couple of instances.
> 
> Thread 1 (Thread 8030021c0 (LWP 101442)):
> #0  0x00000008027079dc in kevent () from /lib/libc.so.7
> #1  0x00000008026d9d81 in ?? () from /lib/libc.so.7
> #2  0x00000008026da6e4 in __res_nsend () from /lib/libc.so.7
> #3  0x00000008026e91ae in ?? () from /lib/libc.so.7
> #4  0x00000008026e94bf in ?? () from /lib/libc.so.7
> #5  0x00000008026ea2ba in ?? () from /lib/libc.so.7
> #6  0x00000008026fb7e3 in nsdispatch () from /lib/libc.so.7
> #7  0x00000008026eb907 in getaddrinfo () from /lib/libc.so.7
> #8  0x0000000801505d82 in krb5_krbhst_get_addrinfo () from
> /usr/lib/libkrb5.so.10
> #9  0x00000008015051c0 in krb5_sendto () from /usr/lib/libkrb5.so.10
> #10 0x000000080150549f in krb5_sendto_context () from /usr/lib/libkrb5.so.10
> #11 0x00000008014ee82c in krb5_get_init_creds () from /usr/lib/libkrb5.so.10
> #12 0x00000008014ef34a in krb5_get_init_creds_password () from
> /usr/lib/libkrb5.so.10
> #13 0x000000000082c193 in kerberos_kinit_password_ext
> (principal=0x803030b80 "HOSTNAME$@CORP.DOMAIN.COM", password=0x80300a4d0
> "4hOZxLoyOqynNr", time_offset=0, expire_time=0x0, renew
> _till_time=0x0, cache_name=0x80300cdc0 "MEMORY:cliconnect",
> request_pac=false, add_netbios_addr=false, renewable_time=0, ntstatus=0x0)
> at libads/kerberos.c:232
> #14 0x000000000082c426 in kerberos_kinit_password (principal=0x1c <Address
> 0x1c out of bounds>, password=0x7fffffff9ba0 "\035", time_offset=1,
> cache_name=0x7fffffff9bd0 "\n") at libads/kerb
> eros.c:657
> #15 0x000000000058e47f in cli_session_setup_spnego (cli=0x803049f50,
> user=0x803030b80 "HOSTNAME$@CORP.DOMAIN.COM", pass=0x80300a4d0
> "4hOZxLoyOqynNr", user_domain=0x80300d0f0 "CORP",
> dest_realm=0x80305f300 "corp.domain.com") at libsmb/cliconnect.c:1861
> #16 0x000000000049f404 in cm_prepare_connection (retry=<optimized out>,
> cli=<optimized out>, controller=<optimized out>, sockfd=<optimized out>,
> domain=<optimized out>) at winbindd/winbindd
> _cm.c:893
> #17 cm_open_connection (domain=0x80305f200, new_conn=0x80305f720) at
> winbindd/winbindd_cm.c:1606
> #18 0x000000000049f89d in init_dc_connection_network (domain=0x80305f200)
> at winbindd/winbindd_cm.c:1788
> #19 0x000000000049f8ee in init_dc_connection (domain=0x1c) at
> winbindd/winbindd_cm.c:1808
> #20 0x000000000049f911 in init_dc_connection_rpc (domain=0x1c) at
> winbindd/winbindd_cm.c:1815
> #21 0x000000000049f97d in cm_connect_netlogon (domain=0x1c,
> cli=0x7fffffff9ba0) at winbindd/winbindd_cm.c:2623
> #22 0x00000000004983da in winbind_samlogon_retry_loop (domain=0x80305f200,
> mem_ctx=0x8030095b0, logon_parameters=2080, server=0x80305f590 "
> DC01.corp.domain.com", username=0x7fffff
> ffe074 "1420djc", domainname=0x7fffffffe174 "CORP",
> workstation=0x7fffffffe47c "MH2017", chal=0x7fffffffe068
> "\375\211X\310{\303\021\300 \b", lm_response=..., nt_response=...,
> info3=0x7ffff
> fffd108) at winbindd/winbindd_pam.c:1178
> #23 0x0000000000499626 in winbindd_dual_pam_auth_crap (domain=0x80305f200,
> state=0x7fffffffe800) at winbindd/winbindd_pam.c:1875
> #24 0x00000000004aeeed in child_process_request (state=<optimized out>,
> child=<optimized out>) at winbindd/winbindd_dual.c:495
> #25 fork_domain_child (child=<optimized out>) at
> winbindd/winbindd_dual.c:1609
> #26 wb_child_request_trigger (req=<optimized out>, private_data=<optimized
> out>) at winbindd/winbindd_dual.c:200
> #27 0x0000000000569db0 in tevent_common_loop_immediate (ev=0x80301e110) at
> ../lib/tevent/tevent_immediate.c:139
> #28 0x0000000000568075 in run_events_poll (ev=0x80301e110, pollrtn=0,
> pfds=0x0, num_pfds=0) at lib/events.c:197
> #29 0x0000000000568799 in s3_event_loop_once (ev=0x80301e110,
> location=<optimized out>) at lib/events.c:331
> #30 0x0000000000568bb1 in _tevent_loop_once (ev=0x80301e110,
> location=0x8b45fa "winbindd/winbindd.c:1491") at ../lib/tevent/tevent.c:494
> #31 0x0000000000489a22 in main (argc=<optimized out>, argv=<optimized out>,
> envp=<optimized out>) at winbindd/winbindd.c:1491
> 
> Above stack belongs to parent winbindd process. And below one is child
> winbindd process.

Really? I see child_process_request() in the backtrace which
at least hints at it being in a child. That's at least
interesting to investigate how the winbind parent ended up
in this call stack. winbindd_dual_pam_auth_crap is at least
designed to only be called from within a child.

However, the only real way around this is to completely
isolate and time-out the getaddinfo call in a separate
process if the FreeBSD libc does not time out at all. This
would be quite an undertaking, because as you can see we
also have to much better isolate the kerberos calls.

In the meantime, does killing the stuck winbind child only
help?

Thanks,

Volker

-- 
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de


More information about the samba-technical mailing list