Need tips on debugging assert_no_pending_aio() cores

Ashok Ramakrishnan aramakrishnan at nasuni.com
Thu Sep 24 14:44:53 UTC 2020


Hi:
We use Samba on top of our user space (fuse) file system. We just recently updated to samba 4.12.6 (still in pre-release testing internally) and we are running into these smbd cores after very heavy IO load. On looking at the core, I see that there seems to be a race (or a mismatch) between the num_aio_requests accounting and the actual requests linked to the fsp structure (fsp->aio_requests)... Since we are on 4.12.6, we already have the fixes for https://bugzilla.samba.org/show_bug.cgi?id=14301. My question is, how do I debug this issue further? Is it just code inspection, adding additional debug logging? Or is there a better way?

Also, I could use some help understanding this code block in aio_del_req_from_fsp()
        if (i == fsp->num_aio_requests) {
                DEBUG(1, ("req %p not found in fsp %p\n", req, fsp));
                return 0;
        }
Why is it OK to not find an aio request attached to the fsp while destructing it? Is there a valid use case where this is expected to happen? I am not sure we are running into the above code block, plan to set log level 1 to see if that is the case.. Just noticed this during code inspection and trying to understand the logic there.

Thanks in advance for any tips/pointers.

-Ashok



(gdb) fr 8

#8  0x00007f295ff32098 in assert_no_pending_aio (close_type=close_type at entry=SHUTDOWN_CLOSE, fsp=<optimized out>, fsp=<optimized out>) at ../../source3/smbd/close.c:669

669                             TALLOC_FREE(fsp->aio_requests[0]);

(gdb) p fsp

$1 = <optimized out>

(gdb) p num_requests

$2 = 1

(gdb) fr 10

#10 close_file (req=req at entry=0x0, fsp=0x55ab4c315610, close_type=close_type at entry=SHUTDOWN_CLOSE) at ../../source3/smbd/close.c:1277

1277                    status = close_normal_file(req, fsp, close_type);

(gdb) p fsp

$3 = (files_struct *) 0x55ab4c315610

(gdb) p fsp->aio_requests

$4 = (struct tevent_req **) 0x0

(gdb)


#0  0x00007f295eb17337 in raise () from /lib64/libc.so.6
#1  0x00007f295eb18a28 in abort () from /lib64/libc.so.6
#2  0x00007f29602320d9 in dump_core () at ../../source3/lib/dumpcore.c:338
#3  0x00007f2960241a08 in smb_panic_s3 (why=0x7f29602bfc5c "internal error")
    at ../../source3/lib/util.c:853
#4  0x00007f29602b37bd in smb_panic (
    why=why at entry=0x7f29602bfc5c "internal error")
    at ../../lib/util/fault.c:174
#5  0x00007f29602b3a1e in fault_report (sig=<optimized out>)
    at ../../lib/util/fault.c:88
#6  sig_fault (sig=11) at ../../lib/util/fault.c:99
#7  <signal handler called>
#8  0x00007f295ff32098 in assert_no_pending_aio (
    close_type=close_type at entry=SHUTDOWN_CLOSE, fsp=<optimized out>,
    fsp=<optimized out>) at ../../source3/smbd/close.c:669
#9  0x00007f295ff32cbf in close_normal_file (close_type=SHUTDOWN_CLOSE,
    fsp=0x55ab4c315610, req=0x0) at ../../source3/smbd/close.c:695
#10 close_file (req=req at entry=0x0, fsp=0x55ab4c315610,
    close_type=close_type at entry=SHUTDOWN_CLOSE)
    at ../../source3/smbd/close.c:1277
#11 0x00007f295fecb9f5 in file_close_user (sconn=sconn at entry=0x55ab4c2d4440,
    vuid=1512462701) at ../../source3/smbd/files.c:257
#12 0x00007f295ff7eefd in smbXsrv_session_logoff (session=0x55ab4c322610)
---Type <return> to continue, or q <return> to quit---
    at ../../source3/smbd/smbXsrv_session.c:1671
#13 0x00007f295ff7f1c6 in smbXsrv_session_clear_and_logoff (
    session=<optimized out>) at ../../source3/smbd/smbXsrv_session.c:1192
#14 0x00007f295ff7f2a2 in smbXsrv_session_logoff_all_callback (
    local_rec=local_rec at entry=0x7fff6a9379b0,
    private_data=private_data at entry=0x7fff6a937a90)
    at ../../source3/smbd/smbXsrv_session.c:1820
#15 0x00007f295e0ad241 in db_rbt_traverse_internal (
    db=db at entry=0x55ab4c320550,
    f=f at entry=0x7f295ff7f250 <smbXsrv_session_logoff_all_callback>,
    private_data=private_data at entry=0x7fff6a937a90,
    count=count at entry=0x7fff6a937a44, rw=rw at entry=true)
    at ../../lib/dbwrap/dbwrap_rbt.c:456
#16 0x00007f295e0ad3da in db_rbt_traverse (db=0x55ab4c320550,
    f=0x7f295ff7f250 <smbXsrv_session_logoff_all_callback>,
    private_data=0x7fff6a937a90) at ../../lib/dbwrap/dbwrap_rbt.c:514
#17 0x00007f295e0ab8aa in dbwrap_traverse (db=<optimized out>,
    f=f at entry=0x7f295ff7f250 <smbXsrv_session_logoff_all_callback>,
    private_data=private_data at entry=0x7fff6a937a90,
    count=count at entry=0x7fff6a937a8c) at ../../lib/dbwrap/dbwrap.c:377
#18 0x00007f295ff7f437 in smbXsrv_session_logoff_all (
   client=client at entry=0x55ab4c310c30)
    at ../../source3/smbd/smbXsrv_session.c:1774
#19 0x00007f295ff84a9a in exit_server_common (
    how=how at entry=SERVER_EXIT_NORMAL,
    reason=0x7f295fb1b4c9 "NT_STATUS_CONNECTION_RESET")
    at ../../source3/smbd/server_exit.c:159
#20 0x00007f295ff84f50 in smbd_exit_server_cleanly (
    explanation=<optimized out>) at ../../source3/smbd/server_exit.c:266
#21 0x00007f2960121ac4 in exit_server_cleanly (
    reason=reason at entry=0x7f295fb1b4c9 "NT_STATUS_CONNECTION_RESET")
    at ../../source3/lib/smbd_shim.c:121

#22 0x00007f295ff5c630 in smbd_server_connection_terminate_ex (
    xconn=0x55ab4c2fb3a0, reason=0x7f295fb1b4c9 "NT_STATUS_CONNECTION_RESET",
    location=location at entry=0x7f296004f380 "../../source3/smbd/smb2_setinfo.c:161") at ../../source3/smbd/smb2_server.c:1143
#23 0x00007f295ff781c0 in smbd_smb2_request_setinfo_done (
    subreq=0x55ab4c3179d0) at ../../source3/smbd/smb2_setinfo.c:160
#24 0x00007f295ff5d7a7 in smbd_smb2_request_pending_queue (
    req=req at entry=0x55ab4c316340, subreq=subreq at entry=0x55ab4c3179d0,
    defer_time=defer_time at entry=500) at ../../source3/smbd/smb2_server.c:1420
#25 0x00007f295ff78d91 in smbd_smb2_request_process_setinfo (
    req=req at entry=0x55ab4c316340) at ../../source3/smbd/smb2_setinfo.c:122
#26 0x00007f295ff5f3aa in smbd_smb2_request_dispatch (
    req=req at entry=0x55ab4c316340) at ../../source3/smbd/smb2_server.c:2792
#27 0x00007f295ff60aac in smbd_smb2_request_dispatch_immediate (
---Type <return> to continue, or q <return> to quit---
    ctx=ctx at entry=0x55ab4c2e1e40, im=im at entry=0x55ab4c3178f0,
    private_data=private_data at entry=0x55ab4c316340)
    at ../../source3/smbd/smb2_server.c:3137
#28 0x00007f295f0d3039 in tevent_common_invoke_immediate_handler (
    im=0x55ab4c3178f0, removed=removed at entry=0x0)
    at ../../tevent_immediate.c:166
#29 0x00007f295f0d3060 in tevent_common_loop_immediate (
    ev=ev at entry=0x55ab4c2e1e40) at ../../tevent_immediate.c:203
#30 0x00007f295f0d8e9d in epoll_event_loop_once (ev=0x55ab4c2e1e40,
    location=<optimized out>) at ../../tevent_epoll.c:918
#31 0x00007f295f0d70d7 in std_event_loop_once (ev=0x55ab4c2e1e40,
    location=0x7f296003d770 "../../source3/smbd/process.c:4170")
    at ../../tevent_standard.c:110
#32 0x00007f295f0d231d in _tevent_loop_once (ev=ev at entry=0x55ab4c2e1e40,
    location=location at entry=0x7f296003d770 "../../source3/smbd/process.c:4170")
    at ../../tevent.c:772
#33 0x00007f295f0d257b in tevent_common_loop_wait (ev=0x55ab4c2e1e40,
    location=0x7f296003d770 "../../source3/smbd/process.c:4170")
    at ../../tevent.c:895

#34 0x00007f295f0d7077 in std_event_loop_wait (ev=0x55ab4c2e1e40,
    location=0x7f296003d770 "../../source3/smbd/process.c:4170")
    at ../../tevent_standard.c:141
#35 0x00007f295ff4e844 in smbd_process (ev_ctx=ev_ctx at entry=0x55ab4c2e1e40,
---Type <return> to continue, or q <return> to quit---
    msg_ctx=msg_ctx at entry=0x55ab4c2d45f0, sock_fd=sock_fd at entry=38,
    interactive=interactive at entry=false) at ../../source3/smbd/process.c:4170
#36 0x000055ab4bb20ce8 in smbd_accept_connection (ev=0x55ab4c2e1e40,
    fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>)
    at ../../source3/smbd/server.c:1012
#37 0x00007f295f0d2bd3 in tevent_common_invoke_fd_handler (
    fde=fde at entry=0x55ab4c308090, flags=<optimized out>,
    removed=removed at entry=0x0) at ../../tevent_fd.c:138
#38 0x00007f295f0d90e7 in epoll_event_loop (tvalp=0x7fff6a938260,
    epoll_ev=0x55ab4c2d4440) at ../../tevent_epoll.c:736
#39 epoll_event_loop_once (ev=<optimized out>, location=<optimized out>)
    at ../../tevent_epoll.c:937
#40 0x00007f295f0d70d7 in std_event_loop_once (ev=0x55ab4c2e1e40,
    location=0x55ab4bb255f8 "../../source3/smbd/server.c:1359")
    at ../../tevent_standard.c:110
#41 0x00007f295f0d231d in _tevent_loop_once (ev=ev at entry=0x55ab4c2e1e40,
    location=location at entry=0x55ab4bb255f8 "../../source3/smbd/server.c:1359")
    at ../../tevent.c:772
#42 0x00007f295f0d257b in tevent_common_loop_wait (ev=0x55ab4c2e1e40,
    location=0x55ab4bb255f8 "../../source3/smbd/server.c:1359")
    at ../../tevent.c:895
#43 0x00007f295f0d7077 in std_event_loop_wait (ev=0x55ab4c2e1e40,
    location=0x55ab4bb255f8 "../../source3/smbd/server.c:1359")
---Type <return> to continue, or q <return> to quit---
    at ../../tevent_standard.c:141
#44 0x000055ab4bb1b4d7 in smbd_parent_loop (parent=<optimized out>,
    ev_ctx=0x55ab4c2e1e40) at ../../source3/smbd/server.c:1359
#45 main (argc=<optimized out>, argv=<optimized out>)
    at ../../source3/smbd/server.c:2197

This e-mail message and all attachments transmitted with it may contain privileged and/or confidential information intended solely for the use of the addressee(s). If the reader of this message is not the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, forwarding or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete this message, all attachments and all copies and backups thereof.


More information about the samba-technical mailing list