2.5.5-1 rsync hangs

jw schultz jw at pegasys.ws
Tue Sep 24 21:52:01 EST 2002

On Tue, Sep 24, 2002 at 07:29:40PM +0900, Anthony Heading wrote:
> On Mon, Sep 23, 2002 at 01:06:22PM -0700, Sudheer Tumuluru wrote:
> > 
> >     I am having the same problem with rsync 2.5.5-1. I am
> > trying to rsync a couple of short text files between a linux server and
> > Win2k Professional boxes with cygwin. About 20% of the time, rsync freezes
> > at the end of the transfer, and I can't kill the rsync process in
> > cygwin even if I give it a 9 (SIGTERM) signal. This happens mostly on dual-processor
> > machines but it did happen once on the single proc machine as well.
> Me too.  I spent this afternoon debugging.  There doesn't appear to be
> anything wrong with rsync - looks rather more like something in cygwin
> signal delivery is ill.
> In my case I'm trying to pull files onto Windows(XP) from Unix(Solaris).
> rsync forks in this case; the parent process generates the filelisT
> while a child process does the receiving.  (Something like that, at
> least; I guess it's for deadlock avoidance)
> At the end, the parent process waves farewell to the remote server,
> and then does a kill(..., SIGUSR2) on the child pid to tell it to exit.
> This signal seems to get lost, as suggested above, some moderate
> percentage of the time.
> The child process is supposedly waiting for this signal inside
> msleep(), which calls select() to wait in 20ms bursts.  In the
> cases that the child manages to reach the select() in time to
> start waiting, I didn't observer any hangs.  But consistently
> if the kill was received before that point, the child process
> simply locks up. 
> This suggests that hack workaround of adding a call to
> say msleep(30) just before the line kill(pid, SIGUSR2) in
> main.c:do_recv().  
> With that kludge in, I haven't seen any hangs in a few hundred 
> trials.  YMMV, but it might be a helpful bandaid until some
> cygwin expert has the chance fix things properly. 

I looked at this briefly last night and reached the
conclusion that indeed windows or cygwin is somehow dropping
the signal. 

Then today as a result of a question asked by someone else
happened to read the select_tut(2) manpage and noticed this
little gem.

       10.    I have heard that the Windows socket layer does not
              cope  with OOB data properly. It also does not cope
              with select calls when no file descriptors are set
              at  all. Having no file descriptors set is a useful
              way to sleep the process with sub-second precision
              by using the timeout.  (See further on.)

       On  systems  that  do  not have a usleep function, you can
       call select with a finite timeout and no file descriptors
       as follows:

           struct timeval tv;
           tv.tv_sec = 0;
           tv.tv_usec = 200000;  /* 0.2 seconds */
           select (0, NULL, NULL, NULL, &tv);

       This  is only guarenteed to work on Unix systems, however.
Hmm, That is exactly what rsync is doing here.  I don't use
cygwin so i'm strictly a spectator on this issue but it
seems clear that something is amiss with cygwin and the
cygwin developers should be brought in on it.

Until cygwin is fully up-to-date you might want to create a
patch that once given further testing could be included in
the patches directory and referenced in the rsync FAQ.

	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt

More information about the rsync mailing list