Changing back to per-thread credentials on Linux (fixing native AIO).

Mon Jul 2 07:10:39 MDT 2012

On Mon, 2012-07-02 at 14:36 +0200, Volker Lendecke wrote: 
> On Mon, Jul 02, 2012 at 08:13:38AM -0400, Jeff Layton wrote:
> > It took me a minute to parse what Volker was saying but I think I get
> > it now. Volker, correct me if I'm wrong...
> > 
> > It used to be that the direct syscall scheme would just return an int
> > (or whatever you liked). Now, the syscall() wrapper generally returns
> > '-1' on error and sets errno. Since errno isn't thread-local, multiple
> > threads could clobber each others' error codes.
> > 
> > In light of that, I'm inclined to believe that Volker is correct here
> > and you'll be best off with some sort of blessing from the glibc folks
> > on this. A simple, thread-local syscall() function would probably be
> > quite handy, but that's a rather long rope for hanging onesself. ;)

Jeff, we are not using clone() here, so we should have no problem with
errno as it should already be using thread-local storage.

What Volker is afraif of, if I understand correctly, is that glibc
maintainers may decide to change the semantics of syscall() again to
block per-thread credential handling. 

> My reasoning was not that direct. Lets go back a bit:
> 
> I tried to tackle the general async posix syscall problem a
> few years ago using raw clone(). First, clone() is
> potentially cheaper than pthread_create() and second, it
> does not suffer from the process wide credential problem.
> The problem is that from a pure clone()'ed entity (not a
> thread, not a process...) you want to do something sensible,
> in particular you want to issue syscalls. You can't do this
> using the glibc pwrite() wrapper around the INT80 (or
> whatever mechanism is in place) that finally ends up in the
> kernel. The reason is the global errno. The kernel interface
> is a bit different: It returns >=0 in the success case and
> -errno in the error case. If I remember correctly in some
> distant past there was a general syscall()-like function
> echoing exactly this calling convention. It would have been
> an exact match for what I needed: A purely local function
> call handling everything on the stack. Not so anymore with
> glibc. The syscall() function returns -1 on error and fills
> in errno(). So it is only usable when you have correctly set
> up the errno symbol with the thread local storage reference
> syscall() expects. I ended up using assembler, but it became
> more and more difficult over time (try passing the 6
> splice() arguments to the kernel on i386...). So by
> coincidence around that time I listened to a talk by Ulrich.
> I asked him about what I was supposed to do. His response
> was that clone() was not supported, as it would screw up
> internal assumptions of the pthread piece of glibc. Thus
> they have made it harder to even try using syscall(). That's
> at least what I remember from his response.
> 
> Long story short: Anything beyond standard, documented
> behaviour is just not supported or actively blocked by
> glibc. Without official blessing by glibc I see the
> syscall() workaround just in the same place.
> 
> Try googling for anything around glibc linux per-thread
> credentials, and you end up at lists.samba.org in this
> thread. That should tell you something.

I think the only thing it says is that only user space file servers care
about this feature so badly, and only those that still insist in letting
thee kernel enforce permissions (which is a *good* thing).
Practically only samba has dared trying to really solve this problem, we
know other people took the easy(?) route and simply re-implemented
access control in user space (which I think is a terrible idea for
anything general purpose, but probably ok for an embedded product).

Simo.

-- 
Simo Sorce
Samba Team GPL Compliance Officer <simo at samba.org>
Principal Software Engineer at Red Hat, Inc. <simo at redhat.com>