Hitting an AIO problem on FreeBSD when we hit the limit of 100 (actually) queued requests

Richard Sharpe realrichardsharpe at gmail.com
Sat Jan 5 13:57:42 MST 2013


On Fri, Jan 4, 2013 at 3:48 PM, Jeremy Allison <jra at samba.org> wrote:
> On Thu, Jan 03, 2013 at 06:41:17PM -0800, Richard Sharpe wrote:
>> Hi folks,
>>
>> I am hitting a problem with AIO (write) on 3.6.6 when we hit the limit
>> of outstanding AIO requests.
>>
>> My debugging shows the following:
>>
>> [2013/01/03 17:57:14.260653,  5] smbd/smb2_server.c:879(smbd_smb2_request_pendin
>> g_queue)
>>   setting pending: MID = 0x2e4, req->subreq = 0x804a77550
>> [2013/01/03 17:57:14.260974, 10] smbd/smb2_server.c:2574(smbd_smb2_request_incom
>> ing)
>>   smbd_smb2_request_incoming: idx[1] of 4 vectors
>> [2013/01/03 17:57:14.261175, 10] smbd/smb2_server.c:354(smb2_validate_message_id
>> )
>>   smb2_validate_message_id: clearing id 741 (position 741) from bitmap
>> [2013/01/03 17:57:14.261240, 10] smbd/smb2_server.c:1224(smbd_smb2_request_dispa
>> tch)
>>   smbd_smb2_request_dispatch: opcode[SMB2_WRITE] mid = 741
>> [2013/01/03 17:57:14.261314,  4] smbd/uid.c:351(change_to_user)
>>   Skipping user change - already user
>> [2013/01/03 17:57:14.261395, 10] smbd/smb2_write.c:243(smbd_smb2_write_send)
>>   smbd_smb2_write: another-big-one.txt - fnum[19485]
>> [2013/01/03 17:57:14.261474,  3] smbd/aio.c:534(schedule_aio_smb2_write)
>>   smb2: Already have 100 aio activities outstanding.
>>
>> Thus, after MID 0x2e4 we hit the limit.
>>
>> Then I see the following completions:
>>
>>   setting pending: MID = 0x2e4, req->subreq = 0x804a77550
>> **** After this we hit the limit and we see the following requests complete
>>   aio_write:       MID = 0x2e0, subreq = 0x804a27550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x2e1, subreq = 0x804a3b550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x2e2, subreq = 0x804a4f550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x2e3, subreq = 0x804a63550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x2e4, subreq = 0x804a77550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x282, subreq = 0x8042b1550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x29f, subreq = 0x8044f2550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x284, subreq = 0x8042b7550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x286, subreq = 0x8042f8550, errcode: Unknown error: 0
>> ...
>>   aio_write:       MID = 0x2da, subreq = 0x8049ad550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x2db, subreq = 0x8049c1550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x2dc, subreq = 0x8049d5550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x2dd, subreq = 0x8049e9550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x2de, subreq = 0x8049fd550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x2df, subreq = 0x804a13550, errcode: Unknown error: 0
>>   aio_write:       MID = 0x2e0, subreq = 0, errcode: Invalid argument
>>
>> At this point we crash, because 0x2e0 has been replayed. It is the
>> 100th element and it has already been responded to first in the list
>> above, so the subreq has been zeroed out. I think the invalid argument
>> response is because the kernel knows that the AIOCB is not in the
>> system.
>>
>> I will continue to look at this, but it looks like there is a problem
>> in the tevent signal handling stuff and an off-by-one problem. Has
>> anyone seen this?
>
> No, I'm not aware of anything here. Let me know if you want some
> help tracking this one down !

OK, as always, it is not good to have multiple threads handling
signals. Creates races that you don't need.

If that does not fix the problem, I will have to think of other things.

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)


More information about the samba-technical mailing list