working on a 2.5.6pre1 release

Fri Jan 10 02:57:00 EST 2003

On Thu, Jan 09, 2003 at 05:09:07PM -0600, Dave Dykstra wrote:
> It's very hard to debug because it is a timing problem and because it
> happens after rprintf handling is already shut down in the child process.

Fortunately fprintf(stderr, ...) always works, even in the child process.
This is what I've been using to get some status on the problem.

> Everything seems to go through normally but then it exits with an exit
> code of 12, I think because the child receiver process is terminated with
> a SIGUSR2 which is signal 12 and because the bug that was preventing exit
> codes from being properly reported from children has now been fixed.

The value of SIGUSR2 is a red herring.  The error is really RERR_STREAMIO,
which is being returned by the whine_about_eof() routine.  I haven't had
time to figure out why this code is getting sent during the final phase of
the life of the receiver yet, though.  The receiver successfully kills the
generator, gets its 0 status code, begins to return a 0 status code, and
then it suddenly starts exit_cleanup() over again with the error 12 from
the io.c code.

One thing I have discovered is that if I remove the two rprintf() calls
from exit_cleanup() (changing them into fprintf(stderr) calls), I can't
get the test to fail.

My current theory is that the sender is closing down the socket, and if the
receiver just happens to get past the two rprintf()s before this happens,
then all is well.  If not, it gets an error (since something must be trying
to flush during the exit_cleanup(0) processing) and switches to an exit
of 12 (RERR_STREAMIO).

I'll finish debugging this later if no one else gets to it first.

..wayne..