Socket draining for sys_recvfile() (patch for 'reverse sendfile')

Jeremy Allison jra at samba.org
Thu Dec 22 12:12:14 MST 2011


On Thu, Dec 22, 2011 at 11:09:20PM +1100, Andrew Bartlett wrote:
> Volker and Jermey,
> 
> I've been working on the attached patch to implement sys_recvfile() for
> a kernel that has been patched to allow sendfile from a socket to a
> file.  (Patch for 3.5.11). 
> 
> I notice in the git history that this seems to be your area, and that
> you have had trouble with recvfile in practice, with concerns about
> blocking and the current code being effectively disabled in master.  

The only problems we've had here are with the splice() implementation
in Linux being unsuitable for this purpose initially, or later being
slower than standard read/pwrite (hence the patched kernel you're
working with :-).

> As such, I'm hoping you might be able to help me out in understanding
> this area.  

I'll try !

> In particular, what is the purpose of the 'socket draining' code?  I'm a
> little confused, because the comments at the callers
> (source3/smbd/vfs.c) indicate it is expected, but it is only explicitly
> in the splice() code path in source3/lib/recvfile.c and not in the
> read/write 'fallback' (which is actually what we always use). 

Ok, when recvfile() returns we *must* have drained the incoming
socket of the remainder of the SMB packet, else we'll lose packet
boundary coherence.

The socket draining code was added as it is (theoretically?) possible
for the splice() code path to return a short read, leaving data
in the socket buffers (the aforementioned packet boundary coherence
problem) so we need to drain the socket in this case.

It can't happen in the "normal" case as the loop takes care of
this inline - notice the section:

                                if (write_ret <= 0) {
                                        /* write error - stop writing. */
                                        tofd = -1;
                                        saved_errno = errno;
                                        continue;
                                }

which sets "tofd = -1" - and enables us to still do the read()
call (draining the socket) but not the write call (as we know
it will fail - disk full or EIO error maybe?).

That's the "normal" case equivalent to the drain socket code.

> Finally, is there anything else should I know or be looking out for?

Buggy kernel patches mainly :-). That recvfile() code has been
used and shipped in production product (Isilon with modified *BSD
kernel) so I think it's pretty safe (all IMHO of course :-).

> (As a hint to others, I've found that Samba4 client libs underpinning
> smbtorture do not trigger the recvfile path, but smbclient3 does.  This
> is apparently a well known behaviour difference of one padding/alignment
> byte in the WriteX packet)

Don't know about well known :-), but the recvfile() code was
written explicitly to work with Windows clients that do that
one byte alignment (although not mandated by the protocol) so
the s3 client code was written to do the same thing (to ensure
we can benefit from recvfile).

Good luck with it !

FYI: I'm on Holiday from now until Jan 3rd so email response
times can be a little sketchy !

Happy Holidays !

Jeremy.


More information about the samba-technical mailing list