Need tips on debugging assert_no_pending_aio() cores

Jeremy Allison jra at samba.org
Thu Sep 24 16:04:54 UTC 2020


On Thu, Sep 24, 2020 at 02:44:53PM +0000, Ashok Ramakrishnan via samba-technical wrote:
> Hi:
> We use Samba on top of our user space (fuse) file system. We just recently updated to samba 4.12.6 (still in pre-release testing internally) and we are running into these smbd cores after very heavy IO load. On looking at the core, I see that there seems to be a race (or a mismatch) between the num_aio_requests accounting and the actual requests linked to the fsp structure (fsp->aio_requests)... Since we are on 4.12.6, we already have the fixes for https://bugzilla.samba.org/show_bug.cgi?id=14301. My question is, how do I debug this issue further? Is it just code inspection, adding additional debug logging? Or is there a better way?
> 
> Also, I could use some help understanding this code block in aio_del_req_from_fsp()
>         if (i == fsp->num_aio_requests) {
>                 DEBUG(1, ("req %p not found in fsp %p\n", req, fsp));
>                 return 0;
>         }
> Why is it OK to not find an aio request attached to the fsp while destructing it? Is there a valid use case where this is expected to happen? I am not sure we are running into the above code block, plan to set log level 1 to see if that is the case.. Just noticed this during code inspection and trying to understand the logic there.

That's the destructor for the lnk struct, created
as a talloc child of the outstanding tevent_req.

The fsp->aio_requests[index] can be deleted
in a SHUTDOWN_CLOSE independently of the lnk
struct, so the lnk struct needs to allow
the associated fsp->aio_requests[] value
to have been freed.

Check the code and comment in:

source3/smbd/close.c:assert_no_pending_aio()

for details.

I wrote much of this logic, so I can
help you track this down if you can reproduce
it.



More information about the samba-technical mailing list