rsync 2.5.6 timeout bug

Alan Burlison Alan.Burlison at sun.com
Thu Jul 31 08:23:17 EST 2003


I've been getting frequent io errors trying to synchronise a local CPAN 
mirror with the master on ftp.funet.fi, the symptoms being the dreaded

rsync: connection unexpectedly closed (0 bytes read so far)
rsync error: error in rsync protocol data stream (code 12) at io.c(165)

message at the client end.  I've replicated this when mirroring from a local 
CPAN mirror, and the issue seems to be that the server is timing out after 
it has sent the file list to the client but before the client has started 
transferring files.

Despite what the documentation says about the default IO timeout being 
infinite (0), inspection of the code would seem to indicate otherwise:

[io.c]
/** If no timeout is specified then use a 60 second select timeout */
#define SELECT_TIMEOUT 60
:
tv.tv_sec = io_timeout?io_timeout:SELECT_TIMEOUT;
tv.tv_usec = 0;

I haven't crawled through the initialisation code to find out exactly how 
io_timeout gets set, but examination of rsync in daemon mode with a debugger 
reveals that it is using a timeout of 60 seconds when no --timeout is 
specified by the client and there is no timeout value in rsyncd.conf.

The consequence of this is that is the client doesn't respond within 60 
seconds (and as CPAN contains >34,000 files it often doesn't), the server 
process exits, and the client then gets an unexpected EOF.  I've checked 
with the admin of ftp.funet.fi, and he doesn't have a timeout set in 
rsyncd.conf, so it seems that the actual value being used is 60 seconds, 
hence the failures.

Closer examination of the select code reveals other breakage even if the 60 
second default problem is fixed.  The manpage for select says (solaris):

      If the timeout argument is not a null pointer, it points  to
      an  object  of  type struct timeval that specifies a maximum
      interval to wait for  the  selection  to  complete.  If  the
      timeout  argument points to an object of type struct timeval
      whose members are 0, select() does not block. If the timeout
      argument  is  a null pointer, select() blocks until an event
      causes one of the masks to be returned with  a  valid  (non-
      zero)  value.   If  the  time limit expires before any event
      occurs that would cause one of the masks  to  be  set  to  a
      non-zero  value, select() completes successfully and returns
      0.

so if an infinite timeout *is* required, the struct timeval* argument to 
select should be NULL when io_timeout==0, and I see no code in place to do that.

I'm also not clear exactly how the client and server timeout values 
interact, the rsyncd.conf entry says:

The "timeout" option allows you to override the clients choice for IO 
timeout for this module,

which implies that the client timeout value (if specified) is passed across 
the wire and is used by the server - is this really what is supposed to 
happen?  If so, experimentation suggests that it might be broken as well.

I'm happy to fix these problems if someone can confirm that I'm on the right 
track and my understanding is correct.  I'm currently completely unable to 
use rsync to reliably mirror CPAN to the inside of our corporate firewall, 
so I have a strong vested interest in fixing these issues.

Once again, please reply direct as I'm not on the list.


-- 
Alan Burlison
--




More information about the rsync mailing list