Where's Jeremy ?

Volker Lendecke Volker.Lendecke at SerNet.DE
Mon Oct 15 13:21:47 MDT 2012

On Mon, Oct 15, 2012 at 01:54:11PM -0500, Christopher R. Hertel wrote:
> Jeremy,
> Have a great conference.
> ...and if you get a moment, can you:
> a) Give me a brief overview of your opinions concerning Linux AIO?
> b) Point me at any docs, write-ups, blog-posts, or e'mail messages
>    that will allow me to catch up on the subject a bit?
> I know that you've been dealing with these issues.  The Gluster team
> is also looking at Linux AIO and I'd like to know where the
> dragons--if any--be.

Quick summary: Doesn't work.

More explanation: At SDC I talked to Christoph Hellwig, who
explained what goes on: For pwrite, AIO is pointless. The
buffer cache makes it async anyway. If the buffer cache is
full, the kernel is very smart about feeding as much data as
possible to disk, blocking the pwrite calls. For pread,
Linux AIO only works for O_DIRECT, and probably always will.
O_DIRECT brings alignment restrictions. If you use the
libaio API without O_DIRECT, it does its pread/pwrite
business correctly, but it won't go async. The reason why
pread can't be made async with the buffer cache and other
layers involved is that there's a million places where the
kernel could block on the way from disk to user space.
Handling them all correctly and keeping that correct over
releases is too difficult. So the best we can do is do the
aio_pthread code.

Christoph has sent me a very simple patch that would make
pread on a non-blocking fd return EWOULDBLOCK if the data is
not in the buffer cache. This way we might get the best of
both worlds: Only pay the context switch price if there is a
chance that we would block. I still need to test that, but
other stuff is more pressing these days.


SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de

More information about the samba-technical mailing list