[EXTERNAL] Re: Need tips on debugging assert_no_pending_aio() cores

Jeremy Allison jra at samba.org
Thu Sep 24 19:17:22 UTC 2020


On Thu, Sep 24, 2020 at 06:54:38PM +0000, Ashok Ramakrishnan wrote:
> Thanks Jeremy for the tip. We are able to reproduce the issue after a few hours of IO. I re-read the comments and the code and have one follow up question.
> 
> Is it possible for talloc_realloc() in aio_add_req_to_fsp() and aio_del_req_from_fsp()
> to race?

No, that shouldn't be possible. Remember, this part of the
code isn't called from threads. The setup/teardown of the
aio is done in the serving smbd, which is asynchronous but
single threaded in this code.

The code path in schedule_smb2_aio_read() does:

        req = SMB_VFS_PREAD_SEND(aio_ex, fsp->conn->sconn->ev_ctx, fsp,
                                 preadbuf->data, smb_maxcnt, startpos);
        if (req == NULL) {
                DEBUG(0, ("smb2: SMB_VFS_PREAD_SEND failed. "
                          "Error %s\n", strerror(errno)));
                TALLOC_FREE(aio_ex);
                return NT_STATUS_RETRY;
        }
        tevent_req_set_callback(req, aio_pread_smb2_done, aio_ex);

        if (!aio_add_req_to_fsp(fsp, req)) {
                DEBUG(1, ("Could not add req to fsp\n"));
                TALLOC_FREE(aio_ex);
                return NT_STATUS_RETRY;
        }

The default for vfswrap_pread_send() sets up the
underlying thread in the threadpool, then returns
the tevent_req pointer to the caller and adds
it to the async array in aio_add_req_to_fsp().

I can't see a way for that to get out of sync,
unless you have something strange inside your
SMB_VFS_PREAD_SEND() function (and you're not
using the default).

> Since the array is being mem copied when the size is incremented 10 at a time...

Not at the same time.

> I am adding some instrumentation to the code to see if we are running into
> this situation here. But, we seem to end up with a case where fsp->num_aio_requests = 1,
> while the fsp->aio_requests has been freed (because all the outstanding aio requests have been destroyed).

Massive, overkill instrumentation to catch this
is what you need to debug this I think.

It's possible there's a logic bug, I just
don't see it (yet).

Jeremy.



More information about the samba-technical mailing list