Problem: rsync hangs

Ed Santiago santiago at ascend.com
Fri Dec 7 08:42:14 EST 2001


Greetings,

We've been experiencing intermittent timeout errors with rsync 2.4.x,
but I've never been able to set up a small enough test case, nor
track down the problem to its root cause[1].

Rsync 2.5.0 still has problems, but in a perverted way it's better:
it's reproducible more quickly (1 hour, instead of many), and exhibits
different symptoms: a hang, instead of a timeout.

Here's the tail end of output from an instrumented version.  This
is from the server log:

  2001/12/06 11:52:24 [24872] send_files: i=17187 (00004323)
  2001/12/06 11:52:24 [24872] send_files: (17187, /home/santiago/tmp/rsync-test/src-test/CVSROOT/eoi-log/62/20062)

...at this point, it hangs for a while, then every 10 minutes it blurts:

  2001/12/06 12:02:24 [24872] rsync error: timeout in data send/receive (code 30) at io.c(77)
  2001/12/06 12:12:24 [24872] rsync error: timeout in data send/receive (code 30) at io.c(77)
  2001/12/06 12:22:24 [24872] rsync error: timeout in data send/receive (code 30) at io.c(77)
  2001/12/06 12:32:24 [24872] rsync error: timeout in data send/receive (code 30) at io.c(77)
  2001/12/06 12:42:24 [24872] rsync error: timeout in data send/receive (code 30) at io.c(77)

The client side, meanwhile, is stuck at:

  2001/12/06 11:57:27 [16232] CVSROOT/eoi-log/61/51361
  2001/12/06 12:05:34 [16232] CVSROOT/eoi-log/61/51461

If I let it run for several hours, it seems to transfer one file
per hour.  At least, it appends filenames to the log at that rate.
This goes on for 23 hours (the limit of my patience).

FWIW, the 2.4.7 timeout happens under rsh, ssh, and rsyncd.  I've
only tested 2.5.0 under rsh, but have just started an ssh job to
see if/when it hangs.

2.4.7 does consistently get farther along with ssh than with rsh.
Typical # of files transfered before the hang is 3 to 4 thousand,
but as you can see the above run did 17,000+.

Invocation script is attached below.

Has anyone seen this?  Is there anything obvious I'm doing wrong?
Any suggestions for how to track down the problem?

Thanks in advance for any advice,
^E
-- 
Ed Santiago                 Toolsmith                 santiago at ascend.com


 [1] root cause: the farthest I've been able to diagnose is that
     both client and server are stuck in select(), waiting to
     hear from each other.

-------------- next part --------------
#!/bin/sh

CMD=/home/santiago/src/rsync/rsync/rsync.solaris

$CMD	-z -avv --stats --delete --exclude=CVSROOT/history-old.gz	\
	--rsync-path=$CMD					\
	--timeout=600						\
	--exclude=#cvs.rfl.* --exclude=#cvs.lock		\
	--exclude=CVSROOT/.#* --exclude=tools/a2ps/		\
	--exclude=tools/binutils/ --exclude=tools/cvs/		\
	--exclude=tools/egcs/ --exclude=tools/gcc/		\
	--exclude=tools/gdb/ --exclude=tools/gnupro/		\
	--exclude=tools/perl/ --exclude=.repo-double-buffer.pid \
	"<server-name>:/home/santiago/tmp/rsync-test/src-test/CVSROOT" ./results


More information about the rsync mailing list