remote rsync process dies, local hangs

Melvin, Lee lee_melvin at mentorg.com
Tue Jun 18 09:44:02 EST 2002


I've got an rsync job which is consistently failing, but I've been
unable to diagnose the problem.  FAQ/Google/docs/etc. checked and
no luck.

Basically, it looks like the rsync process invoked on the far end
is exiting, and then the local process waits until the timeout and 
exits.

Both systems are Sun boxes, Ultra 10 or better with 256+ MB of memory.
Rsync version is 2.5.0 on the local end and 2.5.5 on the remote end.
Network pipe between the two is 768KB VPN WAN.  On the local side, 
here's
what I see:

Begin job 02-tomove-hpx at Tue Jun 18 10:13:36 2002
Executing /somepath/rsync -z -v --exclude=.snapshot 
--exclude=lost+found --archive --delete --force 
--rsync-path=/usr/local/bin/rsync  /some/path/ 
user at somehost.faraway:/another/path/
         building file list ... done

On the remote end, looking with truss -vpoll -p:

lstat64("toolbox/shaperouter.mgc_shaperouter.attr", 0xFFBEFAE0) = 0
lstat64("toolbox/shaperouter/shaperouter.qual", 0xFFBEFAE0) = 0
lstat64("toolbox/spicenet2G6", 0xFFBEFAE0)	= 0
lstat64("toolbox/spicenet2G6", 0xFFBEF1D8)	= 0
lstat64("toolbox/spicenet2G6.SpiceNet2G6.attr", 0xFFBEFAE0) = 0
lstat64("toolbox/spicenet2G6/spicenet2G6.qual", 0xFFBEFAE0) = 0
lstat64("toolbox/srp", 0xFFBEFAE0)		= 0
lstat64("toolbox/srp", 0xFFBEF1D8)		= 0
lstat64("toolbox/srp.mgc_srp_tool.attr", 0xFFBEFAE0) = 0
lstat64("toolbox/srp/srp.qual", 0xFFBEFAE0)	= 0
lstat64("toolbox/test_fablink", 0xFFBEFAE0)	= 0
lstat64("toolbox/test_fablink", 0xFFBEF1D8)	= 0
lstat64("toolbox/test_fablink.mgc_test_fablink.attr", 0xFFBEFAE0) = 0
lstat64("toolbox/test_fablink/test_fablink.qual", 0xFFBEFAE0) = 0
lstat64("toolbox/test_layout", 0xFFBEFAE0)	= 0
lstat64("toolbox/test_layout", 0xFFBEF1D8)	= 0
lstat64("toolbox/test_layout.mgc_test_layout.attr", 0xFFBEFAE0) = 0
lstat64("toolbox/test_layout/test_layout.qual", 0xFFBEFAE0) = 0
lstat64("toolbox/to_layout", 0xFFBEFAE0)	= 0
lstat64("toolbox/to_layout", 0xFFBEF1D8)	= 0
lstat64("toolbox/to_layout.to_layout_tvpt.attr", 0xFFBEFAE0) = 0
lstat64("toolbox/to_layout/to_layout.qual", 0xFFBEFAE0) = 0
lstat64("toolbox/vnet", 0xFFBEFAE0)		= 0
lstat64("toolbox/vnet", 0xFFBEF1D8)		= 0
lstat64("toolbox/vnet.VNet.attr", 0xFFBEFAE0)	= 0
lstat64("toolbox/vnet/vnet.qual", 0xFFBEFAE0)	= 0
poll(0xFFBEE7E0, 2, 60000)			= 1
	fd=1  ev=POLLOUT rev=POLLOUT
	fd=8  ev=POLLRDNORM rev=0
write(1, "04\0\007FFFFFFFF", 8)			= 8
poll(0xFFBEF4D0, 2, 60000)			= 1
	fd=6  ev=POLLRDNORM rev=POLLRDNORM
	fd=8  ev=POLLRDNORM rev=0
read(6, "FFFFFFFF", 4)				= 4
poll(0xFFBEE850, 2, 60000)			= 1
	fd=1  ev=POLLOUT rev=POLLOUT
	fd=8  ev=POLLRDNORM rev=0
write(1, "04\0\007FFFFFFFF", 8)			= 8
poll(0xFFBEF540, 2, 60000)			= 1
	fd=6  ev=POLLRDNORM rev=POLLRDNORM
	fd=8  ev=POLLRDNORM rev=0
read(6, "01\0\0\0", 4)				= 4
close(6)					= 0
poll(0xFFBEE938, 2, 60000)			= 1
	fd=1  ev=POLLOUT rev=POLLOUT
	fd=8  ev=POLLRDNORM rev=0
write(1, "04\0\007FFFFFFFF", 8)			= 8
kill(18231, SIGUSR2)				= 0
waitid(P_PID, 18231, 0xFFBEFB08, WEXITED|WTRAPPED|WNOHANG) = 0
     Received signal #18, SIGCLD, in poll() [caught]
       siginfo: SIGCLD CLD_EXITED pid=18231 status=0x0000
poll(0xFFBEFAE8, 0, 20)				Err#4 EINTR
waitid(P_ALL, 0, 0xFFBEF620, WEXITED|WTRAPPED|WNOHANG) = 0
waitid(P_ALL, 0, 0xFFBEF620, WEXITED|WTRAPPED|WNOHANG) Err#10 ECHILD
setcontext(0xFFBEF7D0)
poll(0xFFBEFAE8, 0, 16)				= 0
waitid(P_PID, 18231, 0xFFBEFB08, WEXITED|WTRAPPED|WNOHANG) Err#10 ECHILD
sigaction(SIGUSR1, 0xFFBEFB48, 0xFFBEFBC8)	= 0
sigaction(SIGUSR2, 0xFFBEFB48, 0xFFBEFBC8)	= 0
llseek(0, 0, SEEK_CUR)				Err#9 EBADF
_exit(0)
bash-2.03$ 
The destination directory has free space.  I have a job between the same
hosts (different paths) that executes successfully just before this job.
This job fails consistently, but not always after the same file lstat. 
I have tried disabling -z, using --bwlimit, disabling -v, using -vvvvv,
all to no avail.  Also tried changing the local end of the rsync to a
different system.  I still need to try moving the far end, but I do get
a similar problem on a completely different rsync to a different host
(same source).

I can provide additonal details if needed.  Any help greatly 
appreciated.

	- Lee
	lee_melvin at mentor.com




More information about the rsync mailing list