rsync error: unexplained error
David R. Staples
drstapl at empirenet.com
Sun Sep 1 15:17:01 EST 2002
I believe I have found the cause of the unexplained error (code ??) at main.c(line #). In the version I'm running 2.5.2 (obtained from the Free Software Foundation) the line number is 576. It appears the root cause is related to a race condition associated with the termination of child processes.
If the signal handler for SIGCHILD is executed, as the result of a child termination, before the wait_process procedure is executed, the status of pid (as passed to wait_process) will not be available and the waitpid call in wait_process will fail with an ECHILD error. If however, wait_process executes first, it will successfully obtain the exit status of pid and the sigchld_handler can execute, to eliminate zombies, with no adverse affects. Since signal handlers execute asynchronously there is no way to predict when, if at all, a process will encounter this problem.
I am providing below proposed new code that should resolve the problem. Certainly I can make these changes within my own environment, but since I would like to remain consistent with the rsync project, I would like to here from someone regarding incorporation of these changes into rsync or an alternative (official) method to fixing this problem.
I can be reached at drstaples at beckman.com; drstaples at drstaples.com; or drstapl at empirenet.com
Sincerely,
David R. Staples
--------------------------------------------------------------------------------
Proposed new code in main.c
#typedef struct
int pid;
int status;
} pid_status;
pid_status pid_stat_table[10];
static RETSIGTYPE sigchld_handler(int val) {
#ifdef WNOHANG
int indx
int pid;
int status;
do {
pid = waitpid(-1, &status, WNOHANG);
for ( indx = 0; indx < 10; indx++ ) {
if ( pid_stat_table[indx].pid == 0 ) {
pid_stat_table[indx].pid = pid;
pid_stat_table[indx].status = status;
break;
}
}
} while ( pid > 0 );
#endif
}
void wait_process(pid_t pid, int *status)
{
int waited_pid;
int indx;
do {
waited_pid = waitpid(pid, status, WNOHANG);
if ( waited_pid == 0) {
msleep(20);
ioflush();
}
} while ( waited_pid == 0 );
if (( waited_pid == -1 ) && ( errno == ECHILD )) {
/* status of requested child no longer available. Check */
/* to see if it was processed by the sigchld_handler. */
for ( indx = 0; indx < 10; indx++ ) {
if ( pid == pid_stat_table[indx].pid ) {
*status = pid_stat_table[indx].status;
break;
}
}
*status = WEXITSTATUS(*status);
}
-------------- next part --------------
HTML attachment scrubbed and removed
More information about the rsync
mailing list