Problem with child process exit status.

Mike Schwarz schwarz at kscable.com
Fri Apr 12 06:06:02 EST 2002


Initial problem:  
When running 'make test' the hands.test fails as indicated in problem #3711 
and includes the line
  rsync error: unexplained error (code 63) at main.c(537)
The code # changes each time the test is run.

Using HP C-ANSI-C B.11.11.02.
configure line:
CFLAGS="-O" ./configure --prefix=/opt/local

In tracking this down, this is what I found:
In main.c a sigchld_handler is setup that reaps any children processes
ignoring any status from the child.  There is also a wait_process function
in main.c that is used to specifically wait for processes to terminate and
return the status thereof.  What is happening is the child terminates,
the signal handler reaps the child and wait_process gets nothing (eventually) 
and returns without getting any status.  The problem becomes evident on HP-UX
as sometimes when waitpid is called with WNOHANG (in wait_process) it returns
0 (no process to reap yet) but the status is changed.  wait_process then msleeps
and tries again and eventually waitpid returns -1 (no running or defunct process)
at which it returns with the status as changed when it returned 0.

Why HP-UX does this I don't know.  I also didn't find any documentation as
to what it means.  I do know it always sets it to the same status each
time thru the loop (but each run is different).  Also rsync forks twice in
this test.  This only occurs when the first process waits on the 2nd not when
the 2nd waits on the 3rd.

Solution:
1. Remove the sigchld signal handler.  I'm not sure if it is needed - maybe
   in daemon mode.
2. Reset *status to 0 in the wait_process loop.
3. Optionally the sigchld could store the pid/status that wait_process could
   pick up.  This doesn't appear necessary (not sure).
4. Maybe wait_process should check for and report waitpid returnning -1.

The attached patch does 1 and 2 and works for our transfers.

-------------- next part --------------
*** main.c.orig	Tue Mar 26 23:10:44 2002
--- main.c	Mon Apr  8 13:37:27 2002
***************
*** 37,42 ****
--- 37,43 ----
  	while (waitpid(pid, status, WNOHANG) == 0) {
  		msleep(20);
  		io_flush();
+ 		*status = 0;
  	}
          
          /* TODO: If the child exited on a signal, then log an
***************
*** 846,852 ****
  
  	signal(SIGUSR1, sigusr1_handler);
  	signal(SIGUSR2, sigusr2_handler);
! 	signal(SIGCHLD, sigchld_handler);
  #ifdef MAINTAINER_MODE
  	signal(SIGSEGV, rsync_panic_handler);
  	signal(SIGFPE, rsync_panic_handler);
--- 847,853 ----
  
  	signal(SIGUSR1, sigusr1_handler);
  	signal(SIGUSR2, sigusr2_handler);
! 	/* signal(SIGCHLD, sigchld_handler); 	This doesn't process exit status correctly */
  #ifdef MAINTAINER_MODE
  	signal(SIGSEGV, rsync_panic_handler);
  	signal(SIGFPE, rsync_panic_handler);


More information about the rsync mailing list