Socket draining for sys_recvfile() (patch for 'reverse sendfile')

Sat Dec 24 22:02:19 MST 2011

On Sat, Dec 24, 2011 at 10:26:39AM +1100, Andrew Bartlett wrote:
> On Fri, 2011-12-23 at 08:14 +1100, Andrew Bartlett wrote:
> > On Thu, 2011-12-22 at 11:12 -0800, Jeremy Allison wrote:
> 
> > > > In particular, what is the purpose of the 'socket draining' code?  I'm a
> > > > little confused, because the comments at the callers
> > > > (source3/smbd/vfs.c) indicate it is expected, but it is only explicitly
> > > > in the splice() code path in source3/lib/recvfile.c and not in the
> > > > read/write 'fallback' (which is actually what we always use). 
> > > 
> > > Ok, when recvfile() returns we *must* have drained the incoming
> > > socket of the remainder of the SMB packet, else we'll lose packet
> > > boundary coherence.
> > > 
> > > The socket draining code was added as it is (theoretically?) possible
> > > for the splice() code path to return a short read, leaving data
> > > in the socket buffers (the aforementioned packet boundary coherence
> > > problem) so we need to drain the socket in this case.
> 
> I've been looking at it given your comments, and I'm not convinced by
> the current code.  In particular, I'm thinking 	if (total_written <
> count) {
> 
> is not the correct test for draining the socket.  Why does draining the
> socket have anything to do with total_written?  Shouldn't it be:
> 
> if (count) {
> 
> That is, during a multi-part recvfile of 48k of data (due to the 16k
> chunks) I think that we could get a pattern like:
> 
> count = 48000
> total_written = 0
> 
> nread = 16000
> 
> end loop 1:
> total_written = 16000
> count = 32000
> 
> error:
> total_written is < count
> drain 16000 from fromfd, leaving 16000 in fromfd
> 
> or 
> 
> start loop 2:
> nread = 16000
> total_written = 32000
> count = 16000
> 
> error:
> 
> total_written is > count
> no drain, but 16000 still in fromfd
> 
> (sorry if my notes are a little cryptic)

Completely correct. You've found a bug in the splice() codepath !

Well done & thanks.

The bug is that count is decremented by nread in this loop,
so that as you pointed out total_written needn't be considered at
all. The code was probably originally written as an error path
when count wasn't being decremented - but total_written as
being incremented, and then never fixed when the code was
changed to decrement count as well (in fact you can see
this as the code probably originally looked like the fallback
default_sys_recvfile() function where count is never
decremented, so the look condition looks like "while (total < count)").

The reason this hasn't been an issue up until now is that
the splice() codepath is (a) slower than the read/write path,
so was never used by default, (b) the Isilon code didn't exercise
this codepath at all - they used a custom written sendfile()
implementation and (c) it's in an error path rarely taken
(probably only on disk full).

I've fixed this in master, and have logged a bug to get it fixed
in 3.6.2 and 3.5.next (I'll tag you for the patch review for
3.6.x and 3.5.x :-).

https://bugzilla.samba.org/show_bug.cgi?id=8679

Thanks !

Jeremy.