possible bug?

Fri Feb 13 22:04:21 GMT 2004

A google search on this problem did not show any matches, so I'll take 
the chance that someone on this list might consider it an rsync problem.

In a nutshell, if rsync forks a child process to handle the transport 
(rsh in this case) it can hang in wait_process() forever waiting for 
that child process to die. Normally this would not be a problem. 
However, if the wrong packet is dropped (and all of its retries) it is a 
problem.

In the particular case that I have been debugging, the TCP FIN packet 
from the rsh server is not getting received by the rsh client which 
rsync started to send files to a peer. This means that rsh blocks 
indefinitely which in turn keeps rsync in the wait_process loop -- even 
though all files have been transferred.

Yes, I realize the process that kicks off rsync should ensure that it 
terminates in a timely manner. However, I would like to propose a change 
to rsync that would let invocations of rsync with a timeout die based on 
that timeout:  wait_process() could call check_timeout() before calling 
msleep(). When the timeout has been exceeded, rsync would call 
_exit_cleanup and kill_all would take care of the child. You could make 
this check optional so that only calls from client_run do the check.

just a thought.

-- 
david ahern