Unix domain datagram based messaging

Thu Apr 10 11:05:31 MDT 2014

On Thu, Apr 10, 2014 at 08:52:05AM -0700, Jeremy Allison wrote:
> On Thu, Apr 10, 2014 at 01:24:48PM +0200, Stefan (metze) Metzmacher wrote:
> > Am 10.04.2014 12:26, schrieb Volker Lendecke:
> > > On Wed, Apr 09, 2014 at 02:28:19PM -0700, Jeremy Allison wrote:
> > >> It's removing most of our signal issues,
> > >> which is great ! However, at least within
> > >> smbd we still have to handle slow system
> > >> calls returning -1,EINTR if we get hit
> > >> by POSIX realtime signals from the leases
> > >> (of course due to this patchset this should
> > >> now be *extremely* rare :-)
> > >>
> > >> So shouldn't:
> > >>
> > >> recv() in unix_dgram_recv_handler()
> > >> connect() in unix_dgram_send_queue_init()
> > >> send() in unix_dgram_send_job()
> > >> sendmsg() in unix_dgram_send()
> > >>
> > >> in this patchset be changed to handle -1,EINTR with
> > >> an immediate retry ?
> > > 
> > > I'm always a bit reluctant to add these unconditional loops,
> > > because it postpones signal handling unnecessarily and makes
> > > Ctrl-C not work as expected. For the syscalls that are
> > > called from the central event loop, shouldn't we try to make
> > > sure that we handle EINTR in the sense that the call just
> > > did not happen? The event loop should come back to us
> > > immediately again, the socket should still be ready as seen
> > > by poll. You're right, I have not thought about this yet at
> > > all, so some modifications are required. I just don't know
> > > if the hard retry is the right thing in this very lowlevel
> > > library.
> 
> Yes, that happens correctly within tevent. A signal
> causes us to just retry after going through the
> main loop.
> 
> However, on first glance I wasn't sure if
> your changes treated -1 as returning an
> error (I think they did but I'll check again
> to be sure).
> 
> > Yep, I think we can just handle it like EAGAIN.
> 
> Yes, just returning to the main loop to retry
> would be the cleanest solution.

Ok, I took a really close look.

I think the 'connect()' call in unix_dgram_send_queue_init()
needs protecting with a:

do {
	ret = connect()
} while (ret == -1 && errno == EINTR);

loop as getting hit by a signal here has
no means of retry (plus it's going to be
a *really* rare occurance).

The recv() in unix_dgram_recv_handler()
is completely safe already, as it'll
just retry on a -1 return given the
tevent_add_fd() call in tevent_watch_new().

In fact the only thing I can see that
might be an issue here is how does
unix_dgram_recv_handler() handle
a it if we get a geniune error that
causes recv() to return -1. Won't it
keep getting called ? I'll do some
more digging here..

The send() in unix_dgram_send_job()
should be protected by a:

do {
	msg->sent = send(msg->sock, msg->buf, msg->buflen, 0);
} while (msg->sent == -1 && errno == EINTR);

loop as it's only ever called as a
pthread_add_job job, so we never return
to the main loop here (we *are* the main
loop :-).

The code in unix_dgram_send() already
has the case we need, I think you
just need to change:

+       if (errno != EWOULDBLOCK) {
+               return errno;
+       }

to:

+       if (errno != EWOULDBLOCK && errno != EAGAIN && errno != EINTR) {
+               return errno;
+       }

in that function and it's ok !
(NB: I added EAGAIN as the Linux send() man
page says:

       EAGAIN or EWOULDBLOCK
              The socket is marked nonblocking and the requested operation would  block.   POSIX.1-2001  allows
              either  error to be returned for this case, and does not require these constants to have the same
              value, so a portable application should check for both possibilities.
).

Amazing work !!!!!!! I *love* this patchset !

Jeremy.