Hitting an AIO problem on FreeBSD when we hit the limit of 100 (actually) queued requests

Richard Sharpe realrichardsharpe at gmail.com
Thu Jan 3 19:41:17 MST 2013


Hi folks,

I am hitting a problem with AIO (write) on 3.6.6 when we hit the limit
of outstanding AIO requests.

My debugging shows the following:

[2013/01/03 17:57:14.260653,  5] smbd/smb2_server.c:879(smbd_smb2_request_pendin
g_queue)
  setting pending: MID = 0x2e4, req->subreq = 0x804a77550
[2013/01/03 17:57:14.260974, 10] smbd/smb2_server.c:2574(smbd_smb2_request_incom
ing)
  smbd_smb2_request_incoming: idx[1] of 4 vectors
[2013/01/03 17:57:14.261175, 10] smbd/smb2_server.c:354(smb2_validate_message_id
)
  smb2_validate_message_id: clearing id 741 (position 741) from bitmap
[2013/01/03 17:57:14.261240, 10] smbd/smb2_server.c:1224(smbd_smb2_request_dispa
tch)
  smbd_smb2_request_dispatch: opcode[SMB2_WRITE] mid = 741
[2013/01/03 17:57:14.261314,  4] smbd/uid.c:351(change_to_user)
  Skipping user change - already user
[2013/01/03 17:57:14.261395, 10] smbd/smb2_write.c:243(smbd_smb2_write_send)
  smbd_smb2_write: another-big-one.txt - fnum[19485]
[2013/01/03 17:57:14.261474,  3] smbd/aio.c:534(schedule_aio_smb2_write)
  smb2: Already have 100 aio activities outstanding.

Thus, after MID 0x2e4 we hit the limit.

Then I see the following completions:

  setting pending: MID = 0x2e4, req->subreq = 0x804a77550
**** After this we hit the limit and we see the following requests complete
  aio_write:       MID = 0x2e0, subreq = 0x804a27550, errcode: Unknown error: 0
  aio_write:       MID = 0x2e1, subreq = 0x804a3b550, errcode: Unknown error: 0
  aio_write:       MID = 0x2e2, subreq = 0x804a4f550, errcode: Unknown error: 0
  aio_write:       MID = 0x2e3, subreq = 0x804a63550, errcode: Unknown error: 0
  aio_write:       MID = 0x2e4, subreq = 0x804a77550, errcode: Unknown error: 0
  aio_write:       MID = 0x282, subreq = 0x8042b1550, errcode: Unknown error: 0
  aio_write:       MID = 0x29f, subreq = 0x8044f2550, errcode: Unknown error: 0
  aio_write:       MID = 0x284, subreq = 0x8042b7550, errcode: Unknown error: 0
  aio_write:       MID = 0x286, subreq = 0x8042f8550, errcode: Unknown error: 0
...
  aio_write:       MID = 0x2da, subreq = 0x8049ad550, errcode: Unknown error: 0
  aio_write:       MID = 0x2db, subreq = 0x8049c1550, errcode: Unknown error: 0
  aio_write:       MID = 0x2dc, subreq = 0x8049d5550, errcode: Unknown error: 0
  aio_write:       MID = 0x2dd, subreq = 0x8049e9550, errcode: Unknown error: 0
  aio_write:       MID = 0x2de, subreq = 0x8049fd550, errcode: Unknown error: 0
  aio_write:       MID = 0x2df, subreq = 0x804a13550, errcode: Unknown error: 0
  aio_write:       MID = 0x2e0, subreq = 0, errcode: Invalid argument

At this point we crash, because 0x2e0 has been replayed. It is the
100th element and it has already been responded to first in the list
above, so the subreq has been zeroed out. I think the invalid argument
response is because the kernel knows that the AIOCB is not in the
system.

I will continue to look at this, but it looks like there is a problem
in the tevent signal handling stuff and an off-by-one problem. Has
anyone seen this?

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)


More information about the samba-technical mailing list