troubleshooting unexpected disconnects

Fri Oct 17 11:47:20 EST 2003

On Thu, Oct 16, 2003 at 04:49:23PM +0200, beers at xs4all.nl wrote:
> Hi list,
> 
> I am still having troubles with rsync, as on large areas in our setup, it
> keeps dying unexpectedly. See my previous post two days ago for the
> initial tries. I realize it is a somewhat broad question, but how do I
> troubleshoot these disconnects?

It helps if you can keep the messages on the same thread if
you are going to refer to prior posts.

> Im no great programmer, although we have a few good c programmers around,
> and if needed I can ask for their help.
> 
> But for now, my thinking is, this may not be a rsync prob perse, but
> something else is unexpectedly closing the connection rsync is using. But
> there are no firewalls, harddisk space is plenty, and the SCO boxes its
> running on do not ussualy break connections to it. We have had CPIO copies
> for well over 4 hours straight going well.
> 
> I am however quite lost on how to proceed this issue, short of falling
> back to CPIO backupping of the complete areas, which takes bloody long,
> hence our interest in rsync.
> 
> I can sum up all things I tried:
> - using --compress or not, makes no difference
> - Using read only = yes or no, makes no difference
> - setting a timeout=600 in rsyncd.conf, no difference
> - Using --verbose --verbose last night gave me a slight new error:
> 2003/10/15 17:03:40 [2575] rsync on user2 from eurux03 (192.168.100.3)
> 2003/10/15 17:24:17 [2575] rsync error: timeout in data send/receive (code
> 30) at io.c(103)
> 
> this is from rsyncd.log, the client console displays the same. The syslog
> does not display any rsync related messages (it works thou, if my .conf is
> wrong, the complaints end up there)
> 
> If there is any info I could, but did not provide, please correct me. I
> did read the archives, for as much as I understand it, (Im no coder, just
> a sysadmin) and I did RTFM and STFW.
> 
> Any help or hints would be greatly appreciated.

This is quite unusual.  The dropped connections we have seen
have either been a result of mismatching (non)blocking io
with a remote shell utility, pipe and/or signal problems on
cygwinm, or firewalls.  Of course occasionally what looks
like a dropped connection turned out to be a timeout where
the timeout message was not being noticed.

Perhaps it is a problem with the SCO open server.  My
recollection (vague at best) is that openserver is a
descendant of Xenix rather than SVR4 although both codebases
are bugridden antiques.

[Please guys, lets try not to hold the legacy customers
responsible for current vendor behaviour]

While additional verbosity may help although more than -vv
tends to destabilise rsync, the best chance of determining
what is going wrong is to do syscall traces on all the
processes.  In your case (doing a pull from a daemon) that
means the process the daemon spawns (sender) and the pair of
processes (generator, receiver) on the client.  The other
way is to snoop the TCP session but that is less
informative, requires in-depth understanding of the
protocol, and is a lot of work.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt