[Samba] Winbind timeouts/hangs(?)

Heiko Wundram modelnine at modelnine.org
Thu Aug 22 12:41:41 UTC 2019


Hello group,

I'm hitting a problem where group resolution through winbind seems to 
time out sometimes, which in turn causes invalid group information to 
propagate to Cronjob users or similar. The profiling information that 
winbind spits out is the following:

[2019/08/22 14:34:14.809540,  0] 
../source3/winbindd/winbindd.c:1058(winbind_client_processed)
   winbind_client_processed: request took 60.000364 seconds
   [struct process_request_state] ../source3/winbindd/winbindd.c:683 
[2019/08/22 14:33:14.809169] ../source3/winbindd/winbindd.c:853 
[2019/08/22 14:34:14.809533] [60.000364] -> TEVENT_REQ_DONE (2 0))
    [struct winbindd_getgroups_state] 
../source3/winbindd/winbindd_getgroups.c:53 [2019/08/22 14:33:14.809175] 
../source3/winbindd/winbindd_getgroups.c:107 [2019/08/22 
14:34:14.809485] [60.000310] -> TEVENT_REQ_USER_ERROR (3 
10483072397370982515))
     [struct wb_lookupname_state] ../source3/winbindd/wb_lookupname.c:47 
[2019/08/22 14:33:14.809184] ../source3/winbindd/wb_lookupname.c:95 
[2019/08/22 14:34:14.809483] [60.000299] -> TEVENT_REQ_USER_ERROR (3 
10483072397370982515))
      [struct dcerpc_wbint_LookupName_state] 
default/librpc/gen_ndr/ndr_winbind_c.c:781 [2019/08/22 14:33:14.809188] 
default/librpc/gen_ndr/ndr_winbind_c.c:847 [2019/08/22 14:34:14.809481] 
[60.000293] -> TEVENT_REQ_DONE (2 0))
       [struct dcerpc_wbint_LookupName_r_state] 
default/librpc/gen_ndr/ndr_winbind_c.c:693 [2019/08/22 14:33:14.809189] 
default/librpc/gen_ndr/ndr_winbind_c.c:727 [2019/08/22 14:34:14.809479] 
[60.000290] -> TEVENT_REQ_DONE (2 0))
        [struct dcerpc_binding_handle_call_state] 
../librpc/rpc/binding_handle.c:371 [2019/08/22 14:33:14.809190] 
../librpc/rpc/binding_handle.c:520 [2019/08/22 14:34:14.809477] 
[60.000287] -> TEVENT_REQ_DONE (2 0))
         [struct dcerpc_binding_handle_raw_call_state] 
../librpc/rpc/binding_handle.c:149 [2019/08/22 14:33:14.809197] 
../librpc/rpc/binding_handle.c:203 [2019/08/22 14:34:14.809471] 
[60.000274] -> TEVENT_REQ_DONE (2 0))
          [struct wbint_bh_raw_call_state] 
../source3/winbindd/winbindd_dual_ndr.c:89 [2019/08/22 14:33:14.809199] 
../source3/winbindd/winbindd_dual_ndr.c:204 [2019/08/22 14:34:14.809469] 
[60.000270] -> TEVENT_REQ_DONE (2 0))
           [struct wb_domain_request_state] 
../source3/winbindd/winbindd_dual.c:473 [2019/08/22 14:33:14.809202] 
../source3/winbindd/winbindd_dual.c:708 [2019/08/22 14:34:14.809460] 
[60.000258] -> TEVENT_REQ_DONE (2 0))
            [struct wb_child_request_state] 
../source3/winbindd/winbindd_dual.c:198 [2019/08/22 14:33:44.809349] 
../source3/winbindd/winbindd_dual.c:273 [2019/08/22 14:34:14.809457] 
[30.000108] -> TEVENT_REQ_DONE (2 0))
             [struct tevent_queue_wait_state] ../tevent_queue.c:336 
[2019/08/22 14:33:44.809350] ../tevent_queue.c:355 [2019/08/22 
14:33:44.809351] [0.000001] -> TEVENT_REQ_DONE (2 0))
             [struct wb_simple_trans_state] ../nsswitch/wb_reqtrans.c:375 
[2019/08/22 14:33:44.809352] ../nsswitch/wb_reqtrans.c:432 [2019/08/22 
14:34:14.809450] [30.000098] -> TEVENT_REQ_DONE (2 0))
              [struct req_write_state] ../nsswitch/wb_reqtrans.c:158 
[2019/08/22 14:33:44.809353] ../nsswitch/wb_reqtrans.c:194 [2019/08/22 
14:33:44.809377] [0.000024] -> TEVENT_REQ_DONE (2 0))
               [struct writev_state] ../lib/async_req/async_sock.c:263 
[2019/08/22 14:33:44.809353] ../lib/async_req/async_sock.c:412 
[2019/08/22 14:33:44.809376] [0.000023] -> TEVENT_REQ_DONE (2 0))
              [struct resp_read_state] ../nsswitch/wb_reqtrans.c:222 
[2019/08/22 14:33:44.809378] ../nsswitch/wb_reqtrans.c:275 [2019/08/22 
14:34:14.809449] [30.000071] -> TEVENT_REQ_DONE (2 0))
               [struct read_packet_state] 
../lib/async_req/async_sock.c:458 [2019/08/22 14:33:44.809379] 
../lib/async_req/async_sock.c:546 [2019/08/22 14:34:14.809446] 
[30.000067] -> TEVENT_REQ_DONE (2 0))
    [struct resp_write_state] ../nsswitch/wb_reqtrans.c:307 [2019/08/22 
14:34:14.809495] ../nsswitch/wb_reqtrans.c:344 [2019/08/22 
14:34:14.809531] [0.000036] -> TEVENT_REQ_DONE (2 0))
     [struct writev_state] ../lib/async_req/async_sock.c:263 [2019/08/22 
14:34:14.809496] ../lib/async_req/async_sock.c:412 [2019/08/22 
14:34:14.809528] [0.000032] -> TEVENT_REQ_DONE (2 0))

(so, basically, most of the time is spent in wb_domain_request_state, 
with half of that in resp_read_state) . The system in question is doing 
absolutely nothing, no network traffic, and the CPU is idling (i.e., 
winbind seems to be sleeping), when this is logged, and there's also no 
definitive moment where an answer from one of the DCs arrives which 
causes completion. I've tried to follow the control path in the 
corresponding sources noted in the profiling, but couldn't find a 
specific point that would cause hangs, that's why I thought asking here 
whether anyone has seen this behaviour.

The Samba package that's being used is the Debian 10 prepackaged Samba 
4.9.5, and this is on a simple domain-joined client which has correct 
Kerberos and PAM configuration to mirror the AD accounts to the 
operating system. The system has nested groups on, but the complexity of 
those is minor, and all in all, there are (only) about 100 groups and 
250 accounts in the complete domain. The system in question was updated 
from Debian 9 (using the prepackaged Samba/winbind sources of stretch), 
and the configuration hasn't changed. Debian 9 packaged Samba did not 
show this behaviour.

Thanks for any insights on what I might do to debug this further!

-- 
--- Heiko Wundram.



More information about the samba mailing list