[Samba] Samba 4.9.3 and the "10 hour problem"

Peter Eriksson peter at ifm.liu.se
Tue Dec 4 17:55:56 UTC 2018

Tried a Samba 4.9.3 with the patch applied and it still froze up on us at 17:00… So - still no go.

However, one thing that I came to think about was that the test system where I ran the “smbclient” (the monitoring server, a Linux virtual machine) runs a prepackage Samba 4.8.3. Perhaps both server _and_ client side needs this fix?

At 17:03 this is what I saw:
> peter86 at filifjonkan tmp]$ smbclient -m SMB3 -k //filur05/peter86
> session setup failed: NT_STATUS_IO_TIMEOUT

After I ran a “pkill winbindd” on that server (filur05) I could connect again (even though I got one “hanging” idmap-child-winbind, the rest was gone though):

> [peter86 at filifjonkan tmp]$ smbclient -m SMB3 -k //filur05/peter86
> Try "help" to get a list of possible commands.
> smb: \> dir
>   .                                   D        0  Tue Dec  4 11:15:25 2018
>   ..                                  D        0  Mon Dec  3 11:23:14 2018
>   HelloFilur05.txt                    N        8  Wed Nov 29 22:40:01 2017
> 		157286400 blocks of size 1024. 157286224 blocks available

Note that I didn’t restart the winbindd daemons until after I had got that smbclient session running… Perhaps that worked because I used Kerberos authentication though.

Anyway, I got a forced core dump from the “hanging idmap-child” Winbindd and a stack trace on it shows:

> (gdb) bt
> #0  0x0000000809b1bc2a in _poll () from /lib/libc.so.7
> #1  0x0000000801562726 in ?? () from /lib/libthr.so.3
> #2  0x0000000803ce7037 in poll_event_loop_poll (tvalp=0x7fffffffca80, ev=0x812357060) at ../lib/tevent/tevent_poll.c:483
> #3  poll_event_loop_once (ev=0x812357060, location=<optimized out>) at ../lib/tevent/tevent_poll.c:626
> #4  0x0000000803ce44de in _tevent_loop_once (ev=ev at entry=0x812357060, location=location at entry=0x803cea448 "../lib/tevent/tevent_req.c:298") at ../lib/tevent/tevent.c:772
> #5  0x0000000803ce5b18 in tevent_req_poll (req=req at entry=0x8122b7700, ev=ev at entry=0x812357060) at ../lib/tevent/tevent_req.c:298
> #6  0x0000000812810b2a in tldap_search (ld=0x812322520, base=0x81223e2a0 "DC=ad,DC=liu,DC=se", scope=scope at entry=2, 
>     filter=filter at entry=0x812292860 "(&(|(sAMAccountType=805306368)(sAMAccountType=805306369)(sAMAccountType=805306370)(sAMAccountType=268435456)(sAMAccountType=536870912))(|(objectSid=\\01\\05\\00\\00\\00\\00\\00\\
> 05\\15\\00\\00 
>     pmsgs=0x7fffffffcc40) at ../source3/lib/tldap.c:1999
> #7  0x0000000812814efa in idmap_ad_sids_to_unixids (dom=dom at entry=0x812251aa0, ids=ids at entry=0x81226bde0) at ../source3/winbindd/idmap_ad.c:827
> #8  0x0000000812815481 in idmap_ad_sids_to_unixids_retry (dom=0x812251aa0, ids=0x81226bde0) at ../source3/winbindd/idmap_ad.c:946
> #9  0x000000000107a790 in _wbint_Sids2UnixIDs (p=p at entry=0x7fffffffcdd0, r=r at entry=0x8123233e0) at ../source3/winbindd/winbindd_dual_srv.c:203
> #10 0x00000000010d0e49 in api_wbint_Sids2UnixIDs (p=0x7fffffffcdd0) at default/librpc/gen_ndr/srv_winbind.c:391
> #11 0x000000000107a345 in winbindd_dual_ndrcmd (domain=<optimized out>, state=0x7fffffffd058) at ../source3/winbindd/winbindd_dual_ndr.c:369
> #12 0x0000000001076082 in child_process_request (state=0x7fffffffd058, child=<optimized out>) at ../source3/winbindd/winbindd_dual.c:745
> #13 child_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, private_data=0x7fffffffd050) at ../source3/winbindd/winbindd_dual.c:1602
> #14 0x0000000803ce4d1d in tevent_common_invoke_fd_handler (fde=fde at entry=0x812251480, flags=<optimized out>, removed=removed at entry=0x0) at ../lib/tevent/tevent_fd.c:137
> #15 0x0000000803ce72fc in poll_event_loop_poll (tvalp=0x7fffffffcfa0, ev=0x812258060) at ../lib/tevent/tevent_poll.c:569
> #16 poll_event_loop_once (ev=0x812258060, location=<optimized out>) at ../lib/tevent/tevent_poll.c:626
> #17 0x0000000803ce44de in _tevent_loop_once (ev=0x812258060, location=location at entry=0x10eb4d8 "../source3/winbindd/winbindd_dual.c:1817") at ../lib/tevent/tevent.c:772
> #18 0x00000000010797de in fork_domain_child (child=0x133ba80 <static_idmap_child>) at ../source3/winbindd/winbindd_dual.c:1817
> #19 0x0000000001079956 in wb_child_request_waited (subreq=0x0) at ../source3/winbindd/winbindd_dual.c:238
> #20 0x0000000803ce58a3 in _tevent_req_notify_callback (req=req at entry=0x81231da00, location=location at entry=0x803cea170 "../lib/tevent/tevent_queue.c:355") at ../lib/tevent/tevent_req.c:139
> #21 0x0000000803ce594b in tevent_req_finish (req=0x81231da00, state=state at entry=TEVENT_REQ_DONE, location=location at entry=0x803cea170 "../lib/tevent/tevent_queue.c:355") at ../lib/tevent/tevent_req.c:191
> #22 0x0000000803ce5967 in _tevent_req_done (req=<optimized out>, location=location at entry=0x803cea170 "../lib/tevent/tevent_queue.c:355") at ../lib/tevent/tevent_req.c:197
> #23 0x0000000803ce5463 in tevent_queue_wait_trigger (req=<optimized out>, private_data=<optimized out>) at ../lib/tevent/tevent_queue.c:355
> #24 0x0000000803ce544e in tevent_queue_immediate_trigger (ev=ev at entry=0x812258060, im=im at entry=0x81224efe0, private_data=private_data at entry=0x812241040) at ../lib/tevent/tevent_queue.c:149
> #25 0x0000000803ce513a in tevent_common_invoke_immediate_handler (im=0x81224efe0, removed=removed at entry=0x0) at ../lib/tevent/tevent_immediate.c:165
> #26 0x0000000803ce515b in tevent_common_loop_immediate (ev=ev at entry=0x812258060) at ../lib/tevent/tevent_immediate.c:202
> #27 0x0000000803ce6db9 in poll_event_loop_once (ev=0x812258060, location=<optimized out>) at ../lib/tevent/tevent_poll.c:617
> #28 0x0000000803ce44de in _tevent_loop_once (ev=0x812258060, location=0x10d5af8 "../source3/winbindd/winbindd.c:1911") at ../lib/tevent/tevent.c:772
> #29 0x000000000104e9c2 in main (argc=<optimized out>, argv=<optimized out>) at ../source3/winbindd/winbindd.c:1911

I wonder if it for some reason ignores SIGTERM. Hmm… Anyway, since ’smbd’ let me in when I had killed the rest of the Winbindd demons then this hanging "idmap-child” probably isn’t the problem anyway...

- Peter

> On 3 Dec 2018, at 19:37, Jeremy Allison <jra at samba.org> wrote:
> On Mon, Dec 03, 2018 at 07:33:22PM +0100, Peter Eriksson via samba wrote:
>> Hmm…
>> I see that there is a patch in the bugzilla page for that bug. I guess I could try that one… :-)
>> I notice in the patch that there is a lot of talk about SMB2 - we use SMB3 mostly now. But perhaps there is some code sharing? Or perhaps SMB2 is used when talking to the AD servers? Or the few SMB2-talking clients causes the problem for all of the other users? When testing the “timeout” issue I typically try a simple:
>> 	smbclient -m SMB3 -U $USER -k //$SERVER/$SHARE
> In the Samba world SMBx, where x > 1 is a synonym for
> everything other than SMB1 :-).
> So for this kind of thing, SMB2 == SMB3 :-).

More information about the samba mailing list