async zero-copy read&x

Volker Lendecke Volker.Lendecke at SerNet.DE
Tue Jun 22 02:52:16 MDT 2010


On Tue, Jun 22, 2010 at 11:15:32AM +1000, Andrew Bartlett wrote:
> > find some code that uses splice(2) to do read&x in an async
> > and zero-copy fashion. It fetches data from disk using one
> > splice call in an async fashion, and then assumes that the
> > splice call from the pipe to the TCP socket is just moving
> > around some pointers in the kernel, thus won't block. It
> > heavy tuning and clean-up before I will propose it, but it
> > might be a start for some discussion about the design and
> > optimizations.
> 
> The idea of inline asm to make the raw syscalls scares me a little, but
> from what discussions I've seen you in on IRC, it seems like we don't
> have good options here...

Believe me, I've bugged everybody I thought would be able to
solve this mystery to me. glibc just is not equipped to make
raw clone usable, it actively forces you to use pthreads.

We do have alternatives if the inline asm is not available.
That's the whole point of this libasys thing: Make it work
everywhere and use every trick in the book to make it fast
when possible while maintaining a nice and usable API.

I might change the API in the future to also allow for
waiting for the signalfd to be writable. A crazy idea of
mine is to write a NFS client or fuse server backend with
that API. But that is just crazy pipe dreams :-)

> > One thing we could do is to use the tmembuf abstraction and
> > do async pread into that when for example splice(2) is not
> > available or if we want to do signing. I'm planning to add
> > some tmembuf_md5 or tmembuf_sha256 routines to do this.
> 
> I'm really glad that SMB signing hasn't been left out of this.  I know
> it slows things down and isn't used in so many places, but I'm glad it's
> not been just put on the slow path.
> 
> That said, I've not seen tmembuf before - what is is?

That's a new abstraction around the splice(2) syscall that
will eventually fall back to (p)read/(p)pwrite when splice
does not work. From my testing, splice has some very
peculiar failure modes that I would like to be encapsulated
in a small piece of code.

Volker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20100622/0e743300/attachment.pgp>


More information about the samba-technical mailing list