Problem: rsync hangs
santiago at ascend.com
Fri Dec 7 08:42:14 EST 2001
We've been experiencing intermittent timeout errors with rsync 2.4.x,
but I've never been able to set up a small enough test case, nor
track down the problem to its root cause.
Rsync 2.5.0 still has problems, but in a perverted way it's better:
it's reproducible more quickly (1 hour, instead of many), and exhibits
different symptoms: a hang, instead of a timeout.
Here's the tail end of output from an instrumented version. This
is from the server log:
2001/12/06 11:52:24  send_files: i=17187 (00004323)
2001/12/06 11:52:24  send_files: (17187, /home/santiago/tmp/rsync-test/src-test/CVSROOT/eoi-log/62/20062)
...at this point, it hangs for a while, then every 10 minutes it blurts:
2001/12/06 12:02:24  rsync error: timeout in data send/receive (code 30) at io.c(77)
2001/12/06 12:12:24  rsync error: timeout in data send/receive (code 30) at io.c(77)
2001/12/06 12:22:24  rsync error: timeout in data send/receive (code 30) at io.c(77)
2001/12/06 12:32:24  rsync error: timeout in data send/receive (code 30) at io.c(77)
2001/12/06 12:42:24  rsync error: timeout in data send/receive (code 30) at io.c(77)
The client side, meanwhile, is stuck at:
2001/12/06 11:57:27  CVSROOT/eoi-log/61/51361
2001/12/06 12:05:34  CVSROOT/eoi-log/61/51461
If I let it run for several hours, it seems to transfer one file
per hour. At least, it appends filenames to the log at that rate.
This goes on for 23 hours (the limit of my patience).
FWIW, the 2.4.7 timeout happens under rsh, ssh, and rsyncd. I've
only tested 2.5.0 under rsh, but have just started an ssh job to
see if/when it hangs.
2.4.7 does consistently get farther along with ssh than with rsh.
Typical # of files transfered before the hang is 3 to 4 thousand,
but as you can see the above run did 17,000+.
Invocation script is attached below.
Has anyone seen this? Is there anything obvious I'm doing wrong?
Any suggestions for how to track down the problem?
Thanks in advance for any advice,
Ed Santiago Toolsmith santiago at ascend.com
 root cause: the farthest I've been able to diagnose is that
both client and server are stuck in select(), waiting to
hear from each other.
-------------- next part --------------
$CMD -z -avv --stats --delete --exclude=CVSROOT/history-old.gz \
--exclude=#cvs.rfl.* --exclude=#cvs.lock \
--exclude=CVSROOT/.#* --exclude=tools/a2ps/ \
--exclude=tools/binutils/ --exclude=tools/cvs/ \
--exclude=tools/egcs/ --exclude=tools/gcc/ \
--exclude=tools/gdb/ --exclude=tools/gnupro/ \
--exclude=tools/perl/ --exclude=.repo-double-buffer.pid \
More information about the rsync