remote rsync process dies, local hangs

Melvin, Lee lee_melvin at mentorg.com
Tue Jun 18 15:29:05 EST 2002


I did try running 2.5.5 on the local end shortly after posting that
email, and it still fails.  I'm very familiar with the verbose mode
issues, but I've never seen that problem in 2.5.x.  I did try without
-v for testing, but it doesn't help.

I am looking into the possibility of a network-level problem now.
I've forwarded snoop output from both sides of the transaction to
our network admins.  I haven't done that much snoop output
interpretation, but it looks like the remote process is sending
TCP packets from port 951 to port 929 on the local host which are
failing to appear in the local host snoop.  There is at least one
instance of that 951->929 transaction working earlier in the snoop
output, so I'm not sure why it would subsequently fail.

I'd be interested in details about VPN related failures.

	- Lee

On 2002.06.18 16:10 Dave Dykstra wrote:
> Have you tried running rsync 2.5.5 on the local end?  We have seen
> cases where VPNs have been at fault.  Rsync is very hard on TCP
> implementations.  Also, verbose mode has been known to cause problems,
> especially in older versions, you might try it without that.
> 
> - Dave Dykstra
> 
> On Tue, Jun 18, 2002 at 10:42:48AM -0600, Melvin, Lee wrote:
> > I've got an rsync job which is consistently failing, but I've been
> > unable to diagnose the problem.  FAQ/Google/docs/etc. checked and
> > no luck.
> >
> > Basically, it looks like the rsync process invoked on the far end
> > is exiting, and then the local process waits until the timeout and
> > exits.
> >
> > Both systems are Sun boxes, Ultra 10 or better with 256+ MB of
> memory.
> > Rsync version is 2.5.0 on the local end and 2.5.5 on the remote end.
> > Network pipe between the two is 768KB VPN WAN.  On the local side,
> > here's
> > what I see:
> >
> > Begin job 02-tomove-hpx at Tue Jun 18 10:13:36 2002
> > Executing /somepath/rsync -z -v --exclude=.snapshot
> > --exclude=lost+found --archive --delete --force
> > --rsync-path=/usr/local/bin/rsync  /some/path/
> > user at somehost.faraway:/another/path/
> >         building file list ... done
> >
> > On the remote end, looking with truss -vpoll -p:
> >
> > lstat64("toolbox/shaperouter.mgc_shaperouter.attr", 0xFFBEFAE0) = 0
> > lstat64("toolbox/shaperouter/shaperouter.qual", 0xFFBEFAE0) = 0
> > lstat64("toolbox/spicenet2G6", 0xFFBEFAE0)	= 0
> > lstat64("toolbox/spicenet2G6", 0xFFBEF1D8)	= 0
> > lstat64("toolbox/spicenet2G6.SpiceNet2G6.attr", 0xFFBEFAE0) = 0
> > lstat64("toolbox/spicenet2G6/spicenet2G6.qual", 0xFFBEFAE0) = 0
> > lstat64("toolbox/srp", 0xFFBEFAE0)		= 0
> > lstat64("toolbox/srp", 0xFFBEF1D8)		= 0
> > lstat64("toolbox/srp.mgc_srp_tool.attr", 0xFFBEFAE0) = 0
> > lstat64("toolbox/srp/srp.qual", 0xFFBEFAE0)	= 0
> > lstat64("toolbox/test_fablink", 0xFFBEFAE0)	= 0
> > lstat64("toolbox/test_fablink", 0xFFBEF1D8)	= 0
> > lstat64("toolbox/test_fablink.mgc_test_fablink.attr", 0xFFBEFAE0) =
> 0
> > lstat64("toolbox/test_fablink/test_fablink.qual", 0xFFBEFAE0) = 0
> > lstat64("toolbox/test_layout", 0xFFBEFAE0)	= 0
> > lstat64("toolbox/test_layout", 0xFFBEF1D8)	= 0
> > lstat64("toolbox/test_layout.mgc_test_layout.attr", 0xFFBEFAE0) = 0
> > lstat64("toolbox/test_layout/test_layout.qual", 0xFFBEFAE0) = 0
> > lstat64("toolbox/to_layout", 0xFFBEFAE0)	= 0
> > lstat64("toolbox/to_layout", 0xFFBEF1D8)	= 0
> > lstat64("toolbox/to_layout.to_layout_tvpt.attr", 0xFFBEFAE0) = 0
> > lstat64("toolbox/to_layout/to_layout.qual", 0xFFBEFAE0) = 0
> > lstat64("toolbox/vnet", 0xFFBEFAE0)		= 0
> > lstat64("toolbox/vnet", 0xFFBEF1D8)		= 0
> > lstat64("toolbox/vnet.VNet.attr", 0xFFBEFAE0)	= 0
> > lstat64("toolbox/vnet/vnet.qual", 0xFFBEFAE0)	= 0
> > poll(0xFFBEE7E0, 2, 60000)			= 1
> > 	fd=1  ev=POLLOUT rev=POLLOUT
> > 	fd=8  ev=POLLRDNORM rev=0
> > write(1, "04\0\007FFFFFFFF", 8)			= 8
> > poll(0xFFBEF4D0, 2, 60000)			= 1
> > 	fd=6  ev=POLLRDNORM rev=POLLRDNORM
> > 	fd=8  ev=POLLRDNORM rev=0
> > read(6, "FFFFFFFF", 4)				= 4
> > poll(0xFFBEE850, 2, 60000)			= 1
> > 	fd=1  ev=POLLOUT rev=POLLOUT
> > 	fd=8  ev=POLLRDNORM rev=0
> > write(1, "04\0\007FFFFFFFF", 8)			= 8
> > poll(0xFFBEF540, 2, 60000)			= 1
> > 	fd=6  ev=POLLRDNORM rev=POLLRDNORM
> > 	fd=8  ev=POLLRDNORM rev=0
> > read(6, "01\0\0\0", 4)				= 4
> > close(6)					= 0
> > poll(0xFFBEE938, 2, 60000)			= 1
> > 	fd=1  ev=POLLOUT rev=POLLOUT
> > 	fd=8  ev=POLLRDNORM rev=0
> > write(1, "04\0\007FFFFFFFF", 8)			= 8
> > kill(18231, SIGUSR2)				= 0
> > waitid(P_PID, 18231, 0xFFBEFB08, WEXITED|WTRAPPED|WNOHANG) = 0
> >     Received signal #18, SIGCLD, in poll() [caught]
> >       siginfo: SIGCLD CLD_EXITED pid=18231 status=0x0000
> > poll(0xFFBEFAE8, 0, 20)				Err#4 EINTR
> > waitid(P_ALL, 0, 0xFFBEF620, WEXITED|WTRAPPED|WNOHANG) = 0
> > waitid(P_ALL, 0, 0xFFBEF620, WEXITED|WTRAPPED|WNOHANG) Err#10 ECHILD
> > setcontext(0xFFBEF7D0)
> > poll(0xFFBEFAE8, 0, 16)				= 0
> > waitid(P_PID, 18231, 0xFFBEFB08, WEXITED|WTRAPPED|WNOHANG) Err#10
> ECHILD
> > sigaction(SIGUSR1, 0xFFBEFB48, 0xFFBEFBC8)	= 0
> > sigaction(SIGUSR2, 0xFFBEFB48, 0xFFBEFBC8)	= 0
> > llseek(0, 0, SEEK_CUR)				Err#9 EBADF
> > _exit(0)
> > bash-2.03$
> > The destination directory has free space.  I have a job between the
> same
> > hosts (different paths) that executes successfully just before this
> job.
> > This job fails consistently, but not always after the same file
> lstat.
> > I have tried disabling -z, using --bwlimit, disabling -v, using
> -vvvvv,
> > all to no avail.  Also tried changing the local end of the rsync to
> a
> > different system.  I still need to try moving the far end, but I do
> get
> > a similar problem on a completely different rsync to a different
> host
> > (same source).
> >
> > I can provide additonal details if needed.  Any help greatly
> > appreciated.
> >
> > 	- Lee
> > 	lee_melvin at mentor.com
> >
> > --
> > To unsubscribe or change options:
> > http://lists.samba.org/mailman/listinfo/rsync
> > Before posting, read: 
> http://www.tuxedo.org/~esr/faqs/smart-questions.html
> 




More information about the rsync mailing list