rsync error: unexplained error

David R. Staples drstapl at empirenet.com
Sun Sep 1 15:17:01 EST 2002


I believe I have found the cause of the unexplained error (code ??) at main.c(line #).  In the version I'm running 2.5.2 (obtained from the Free Software Foundation) the line number is 576.  It appears the root cause is related to a race condition associated with the termination of child processes.
If the signal handler for SIGCHILD is executed, as the result of a child termination, before the wait_process procedure is executed, the status of pid (as passed to wait_process) will not be available and the waitpid call in wait_process will fail with an ECHILD error.  If however, wait_process executes first, it will successfully obtain the exit status of pid and the sigchld_handler can execute, to eliminate zombies, with no adverse affects.  Since signal handlers execute asynchronously there is no way to predict when, if at all, a process will encounter this problem.
I am providing below proposed new code that should resolve the problem.  Certainly I can make these changes within my own environment, but since I would like to remain consistent with the rsync project, I would like to here from someone regarding incorporation of these changes into rsync or an alternative (official) method to fixing this problem.
I can be reached at drstaples at beckman.com; drstaples at drstaples.com; or drstapl at empirenet.com
Sincerely,
David R. Staples

--------------------------------------------------------------------------------
Proposed new code in main.c
#typedef struct

      int pid;

      int status;

} pid_status;

 

pid_status pid_stat_table[10];

 

static RETSIGTYPE sigchld_handler(int val) {

#ifdef WNOHANG

      int      indx

int      pid;

int      status;

      do {

            pid = waitpid(-1, &status, WNOHANG);

            for ( indx = 0; indx < 10; indx++ ) {

                  if ( pid_stat_table[indx].pid == 0 ) {

                        pid_stat_table[indx].pid = pid;

                        pid_stat_table[indx].status = status;

                        break;

                  }

            }

      } while ( pid > 0 );

#endif

}

 

 

void wait_process(pid_t pid, int *status)

{

      int waited_pid;

      int indx;

 

      do {

            waited_pid = waitpid(pid, status, WNOHANG);

            if ( waited_pid == 0) {

            msleep(20);

                  ioflush();

            }

      } while ( waited_pid == 0 );

      if (( waited_pid == -1 ) && ( errno == ECHILD )) {

            /* status of requested child no longer available.  Check */

/* to see if it was processed by the sigchld_handler.    */

for ( indx = 0; indx < 10; indx++ ) {

      if ( pid == pid_stat_table[indx].pid ) {

            *status = pid_stat_table[indx].status;

            break;

      }

}

      *status = WEXITSTATUS(*status);

}

 

-------------- next part --------------
HTML attachment scrubbed and removed


More information about the rsync mailing list