[PATCH] Fix bug 9412 - SMB2 server doesn't support recvfile.

Fri Apr 5 01:20:08 MDT 2013

Hi Jeremy,

>> I think we should add some protection to req->sconn->smb2.recv_queue,
>> when we do a short read we should place a dummy tevent_req into the
>> recv_queue,
>> which blocks the socket, and remove it when the socket is drained or
>> recvfile is done.
>>
>> next to smb1req->unread_bytes we could setup smb2req->recvfile_blocker;
>>
>> struct smbd_smb2_recvfile_blocker {
>> 	uint8_t dummy;
>> };
>>
>> static void smbd_smb2_recvfile_blocker_trigger(struct tevent_req *req,
>> 					       void *private_data)
>> {
>> 	return;
>> }
>>
>> ...
>> 	struct smbd_smb2_recvfile_blocker *recvfile_blocker;
>> 	smb2req->recvfile_blocker = tevent_req_create(smb2req, &recvfile_blocker,
>> 						      smbd_smb2_recvfile_blocker);
>> 	if (smb2req->recvfile_blocker == NULL) {
>> 		...
>> 	}
>> 	ok = tevent_queue_add(smb2req->sconn->smb2.recv_queue,
>> 			      ...event_context,
>> 			      smb2req->recvfile_blocker,
>> 			      smbd_smb2_recvfile_blocker_trigger,
>> 			      NULL);
>>
>> smbd_smb2_request_error_ex() would then also do
>> TALLOC_FREE(smb2req->recvfile_blocker);
>> after drain_socket() and even if unread_bytes was 0.
> 
> Ok, I see the paranoia behind doing it :-), but is it really
> neccessary ? Under what circumstances would we need this ?
> 
> The SMB1 recvfile code doesn't do this, and if the stream
> gets out of sync it's really obvious immediately.

The whole SMB2 socket handling is different from the SMB1 one,
the SMB2 code is a lot more async and the socket is non-blocking.

Maybe we could let smbd_smb2_request_read_done() keep 'subreq'
if state->doing_receivefile is true, we need to remember this
in smb2req.

As the socket is non-blocking, we need to make sure recvfile()
doesn't get EAGAIN. We could extent the tstream_readv_pdu code
to handle an iovec with iov_len != 0 and iov_base == NULL.
This could be used to wait until the whole request arrived
in the kernel, by checking tstream_pending_bytes().
Or we have to mark the socket blocking during for recvfile() and
drain_socket().

>>> +		} else {
>>> +			/*
>>> +			 * Either this is a receivefile write so we've
>>> +			 * done a short read, or if not we have all the data.
>>> +			 * Either way, we're done and the caller will handle
>>> +			 * and short read case by looking at the
>>> +			 * state->doing_receivefile value.
>>> +			 */
>>> +			*_vector = NULL;
>>> +			*_count = 0;
>>> +		}
>>
>> Shouldn't the caller take a look at smb1req->unread_bytes?
>> state->doing_receivefile is not visible to the caller.
> 
> smb1req->unread_bytes isn't set up until all the
> smbd_smb2_request_next_vector() calls are finished,
> and smbd_smb2_request_read_done() is called.
> 
> smbd_smb2_request_read_done() has full access to
> struct smbd_smb2_request_read_state *state
> and so uses state->doing_receivefile to then set
> up smb1req->unread_bytes.
> 
> I guess I don't fully understand the comment here.
> state->doing_receivefile is a temporary state whilst
> we're deciding if we can do a recvfile, and the permanent
> state that tells the callers that we are doing a
> recveivefile is smb1req->unread_bytes.

Maybe you can change "caller" into "caller (smbd_smb2_request_read_done)".
From just reading the comment it was not clear to me what 'caller' was
referring to.

BTW: I just found the attached patch in one of my branches again,
which should improve the splice recvfile implementation.
But with this junkcode test:
https://gitweb.samba.org/?p=metze/misc/junkcode.git;a=commitdiff;h=43121900e96a0f779e110ed162c1db766eace4e6
https://gitweb.samba.org/?p=metze/misc/junkcode.git;a=blob;f=splice/splice2.c

I'm getting this on my thinkpad x230

metze at SERNOX11:~/devel/junkcode/metze/splice$ time ./splice2 'rw'
104857600000 65536
rw: i[1600000] total_size[104857600000] buffer_size[65536 => 65536]

real    0m5.545s
user    0m0.100s
sys     0m5.420s
metze at SERNOX11:~/devel/junkcode/metze/splice$ time ./splice2 'splice'
104857600000 65536
splice: i[1600000] total_size[104857600000] buffer_size[65536 => 65536]

real    0m10.455s
user    0m0.072s
sys     0m10.301s
metze at SERNOX11:~/devel/junkcode/metze/splice$ time ./splice2 'splice'
104857600000 1048576
splice: i[100000] total_size[104857600000] buffer_size[1048576 => 1048576]

real    0m10.673s
user    0m0.004s
sys     0m10.613s
metze at SERNOX11:~/devel/junkcode/metze/splice$ time ./splice2 'rw'
104857600000 1048576
rw: i[100000] total_size[104857600000] buffer_size[1048576 => 1048576]

real    0m5.779s
user    0m0.000s
sys     0m5.752s
metze at SERNOX11:~/devel/junkcode/metze/splice$ time ./splice2 'rw'
104857600000 10485760
rw: i[10000] total_size[104857600000] buffer_size[10485760 => 10485760]

real    0m11.966s
user    0m0.000s
sys     0m11.909s

The results are really look strange to me...

metze
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tmp.diff
Type: text/x-diff
Size: 1621 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20130405/999cec4a/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20130405/999cec4a/attachment.pgp>