[Samba] Winbind timeouts/hangs(?)
Heiko Wundram
modelnine at modelnine.org
Thu Aug 22 12:41:41 UTC 2019
Hello group,
I'm hitting a problem where group resolution through winbind seems to
time out sometimes, which in turn causes invalid group information to
propagate to Cronjob users or similar. The profiling information that
winbind spits out is the following:
[2019/08/22 14:34:14.809540, 0]
../source3/winbindd/winbindd.c:1058(winbind_client_processed)
winbind_client_processed: request took 60.000364 seconds
[struct process_request_state] ../source3/winbindd/winbindd.c:683
[2019/08/22 14:33:14.809169] ../source3/winbindd/winbindd.c:853
[2019/08/22 14:34:14.809533] [60.000364] -> TEVENT_REQ_DONE (2 0))
[struct winbindd_getgroups_state]
../source3/winbindd/winbindd_getgroups.c:53 [2019/08/22 14:33:14.809175]
../source3/winbindd/winbindd_getgroups.c:107 [2019/08/22
14:34:14.809485] [60.000310] -> TEVENT_REQ_USER_ERROR (3
10483072397370982515))
[struct wb_lookupname_state] ../source3/winbindd/wb_lookupname.c:47
[2019/08/22 14:33:14.809184] ../source3/winbindd/wb_lookupname.c:95
[2019/08/22 14:34:14.809483] [60.000299] -> TEVENT_REQ_USER_ERROR (3
10483072397370982515))
[struct dcerpc_wbint_LookupName_state]
default/librpc/gen_ndr/ndr_winbind_c.c:781 [2019/08/22 14:33:14.809188]
default/librpc/gen_ndr/ndr_winbind_c.c:847 [2019/08/22 14:34:14.809481]
[60.000293] -> TEVENT_REQ_DONE (2 0))
[struct dcerpc_wbint_LookupName_r_state]
default/librpc/gen_ndr/ndr_winbind_c.c:693 [2019/08/22 14:33:14.809189]
default/librpc/gen_ndr/ndr_winbind_c.c:727 [2019/08/22 14:34:14.809479]
[60.000290] -> TEVENT_REQ_DONE (2 0))
[struct dcerpc_binding_handle_call_state]
../librpc/rpc/binding_handle.c:371 [2019/08/22 14:33:14.809190]
../librpc/rpc/binding_handle.c:520 [2019/08/22 14:34:14.809477]
[60.000287] -> TEVENT_REQ_DONE (2 0))
[struct dcerpc_binding_handle_raw_call_state]
../librpc/rpc/binding_handle.c:149 [2019/08/22 14:33:14.809197]
../librpc/rpc/binding_handle.c:203 [2019/08/22 14:34:14.809471]
[60.000274] -> TEVENT_REQ_DONE (2 0))
[struct wbint_bh_raw_call_state]
../source3/winbindd/winbindd_dual_ndr.c:89 [2019/08/22 14:33:14.809199]
../source3/winbindd/winbindd_dual_ndr.c:204 [2019/08/22 14:34:14.809469]
[60.000270] -> TEVENT_REQ_DONE (2 0))
[struct wb_domain_request_state]
../source3/winbindd/winbindd_dual.c:473 [2019/08/22 14:33:14.809202]
../source3/winbindd/winbindd_dual.c:708 [2019/08/22 14:34:14.809460]
[60.000258] -> TEVENT_REQ_DONE (2 0))
[struct wb_child_request_state]
../source3/winbindd/winbindd_dual.c:198 [2019/08/22 14:33:44.809349]
../source3/winbindd/winbindd_dual.c:273 [2019/08/22 14:34:14.809457]
[30.000108] -> TEVENT_REQ_DONE (2 0))
[struct tevent_queue_wait_state] ../tevent_queue.c:336
[2019/08/22 14:33:44.809350] ../tevent_queue.c:355 [2019/08/22
14:33:44.809351] [0.000001] -> TEVENT_REQ_DONE (2 0))
[struct wb_simple_trans_state] ../nsswitch/wb_reqtrans.c:375
[2019/08/22 14:33:44.809352] ../nsswitch/wb_reqtrans.c:432 [2019/08/22
14:34:14.809450] [30.000098] -> TEVENT_REQ_DONE (2 0))
[struct req_write_state] ../nsswitch/wb_reqtrans.c:158
[2019/08/22 14:33:44.809353] ../nsswitch/wb_reqtrans.c:194 [2019/08/22
14:33:44.809377] [0.000024] -> TEVENT_REQ_DONE (2 0))
[struct writev_state] ../lib/async_req/async_sock.c:263
[2019/08/22 14:33:44.809353] ../lib/async_req/async_sock.c:412
[2019/08/22 14:33:44.809376] [0.000023] -> TEVENT_REQ_DONE (2 0))
[struct resp_read_state] ../nsswitch/wb_reqtrans.c:222
[2019/08/22 14:33:44.809378] ../nsswitch/wb_reqtrans.c:275 [2019/08/22
14:34:14.809449] [30.000071] -> TEVENT_REQ_DONE (2 0))
[struct read_packet_state]
../lib/async_req/async_sock.c:458 [2019/08/22 14:33:44.809379]
../lib/async_req/async_sock.c:546 [2019/08/22 14:34:14.809446]
[30.000067] -> TEVENT_REQ_DONE (2 0))
[struct resp_write_state] ../nsswitch/wb_reqtrans.c:307 [2019/08/22
14:34:14.809495] ../nsswitch/wb_reqtrans.c:344 [2019/08/22
14:34:14.809531] [0.000036] -> TEVENT_REQ_DONE (2 0))
[struct writev_state] ../lib/async_req/async_sock.c:263 [2019/08/22
14:34:14.809496] ../lib/async_req/async_sock.c:412 [2019/08/22
14:34:14.809528] [0.000032] -> TEVENT_REQ_DONE (2 0))
(so, basically, most of the time is spent in wb_domain_request_state,
with half of that in resp_read_state) . The system in question is doing
absolutely nothing, no network traffic, and the CPU is idling (i.e.,
winbind seems to be sleeping), when this is logged, and there's also no
definitive moment where an answer from one of the DCs arrives which
causes completion. I've tried to follow the control path in the
corresponding sources noted in the profiling, but couldn't find a
specific point that would cause hangs, that's why I thought asking here
whether anyone has seen this behaviour.
The Samba package that's being used is the Debian 10 prepackaged Samba
4.9.5, and this is on a simple domain-joined client which has correct
Kerberos and PAM configuration to mirror the AD accounts to the
operating system. The system has nested groups on, but the complexity of
those is minor, and all in all, there are (only) about 100 groups and
250 accounts in the complete domain. The system in question was updated
from Debian 9 (using the prepackaged Samba/winbind sources of stretch),
and the configuration hasn't changed. Debian 9 packaged Samba did not
show this behaviour.
Thanks for any insights on what I might do to debug this further!
--
--- Heiko Wundram.
More information about the samba
mailing list