SIGUSR1 or SIGINT error

Dave Dykstra dwd at bell-labs.com
Sat Feb 23 02:53:14 EST 2002


[adding the mailing list to the Cc after David sent me trusses and other
info about his hang.  I have trimmed down the tracing info to the relevant
pieces.]

On Tue, Feb 19, 2002 at 11:51:27PM -0500, David Birnbaum wrote:
> I wasn't able to get a netstat this time.  However, I should be able to
> grab it again if these truss's don't help.  I'm thinking I will run a
> snoop the next time and capture the packet traffic to see what's happening
> at the TCP level - at least we should be able to see who/what shut down
> first, because from the truss, it looks like the kernel shut down the
> whole thing at once.  It's tough to time the netstat because it takes a
> while for it to die.

I think the trusses with timestamps showed me enough.


> Trusses, lsof, netstats attached.  Thanks for the help; I appreciate the
> efforts.

The client is doing nothing complex, it's just merrilly blasting away down
file descriptor 4 and never reading it.  It then hangs and the only reason
it wakes up is because the connection has been torn down.

The server process looks very odd.  It is doing a 196K read from a file, and
it takes 9 minutes to return from that, and when it wakes up from that it
finds that an EPIPE from the connection to the rsync client.  Why would a
196K read take 9 minutes?  It looks like it is going over NFS, so maybe the
problem is with NFS.  What's the network between the NFS server and the
rsync server?  Is it possible to run rsync on the NFS server itself, or is
that a network appliance or something?  It could be that the TCP 
implementation on the client side wasn't willing to wait for a response for
9 minutes and that's why it shut things down with an EPIPE.

At the same time, the server child process is doing a 262K read from another
file over the same NFS connection and it takes the same 9 minutes and 
eventually suceeds.  It only dies because the other server process had died
and it gets a signal.


> David.


- Dave Dykstra


> Content-Description: client lsof
> COMMAND   PID USER   FD   TYPE        DEVICE   SIZE/OFF    NODE NAME
...
> rsync   20039 root    4u  IPv4 0x3000dbd3810 0t64514407     TCP corona.chelsea.net:51816->hite.chelsea.net:rsync (ESTABLISHED)
> rsync   20039 root    5r  VREG        32,359   52273152  243313 /export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/cyberounds.com/view.dat

> Content-Description: client truss
...
> 3504.6000	write(4, " w s   9 8 ;   V Z _ I E".., 1021)	= 1021
> 3504.6003	poll(0xFFBE5CA0, 1, 60000)			= 1
> 3504.6005	write(4, "03FCFFFF", 4)				= 4
> 3504.6010	poll(0xFFBE5CA0, 1, 60000)			= 1
> 3504.6012	write(4, "02FCFFFF", 4)				= 4
> 3504.6016	poll(0xFFBE5CA0, 1, 60000)			= 1
> 3504.6019	write(4, "01FCFFFF", 4)				= 4
> 3504.6034	poll(0xFFBE5CA0, 1, 60000)			= 1
> 3504.6036	write(4, "E306\0\0", 4)				= 4
> 3504.6041	poll(0xFFBE5D18, 1, 60000)			= 1
> 3504.6046	write(4, " s t 1 0 3 . t n t 4 . r".., 1763)	= 1763
> poll(0xFFBE5CA0, 1, 60000)	(sleeping...)
> 3564.6045	poll(0xFFBE5CA0, 1, 60000)			= 0
> poll(0xFFBE5CA0, 1, 60000)	(sleeping...)
> 3624.6045	poll(0xFFBE5CA0, 1, 60000)			= 0
> poll(0xFFBE5CA0, 1, 60000)	(sleeping...)
> 3684.6045	poll(0xFFBE5CA0, 1, 60000)			= 0
> poll(0xFFBE5CA0, 1, 60000)	(sleeping...)
> 3744.6046	poll(0xFFBE5CA0, 1, 60000)			= 0
> poll(0xFFBE5CA0, 1, 60000)	(sleeping...)
> 3804.6046	poll(0xFFBE5CA0, 1, 60000)			= 0
> poll(0xFFBE5CA0, 1, 60000)	(sleeping...)
> 3864.6044	poll(0xFFBE5CA0, 1, 60000)			= 0
> poll(0xFFBE5CA0, 1, 60000)	(sleeping...)
> 3924.6041	poll(0xFFBE5CA0, 1, 60000)			= 0
> poll(0xFFBE5CA0, 1, 60000)	(sleeping...)
> 3984.6041	poll(0xFFBE5CA0, 1, 60000)			= 0
> poll(0xFFBE5CA0, 1, 60000)	(sleeping...)
> 4037.3040	poll(0xFFBE5CA0, 1, 60000)			= 1
> 4037.3046	write(4, "FEFBFFFF", 4)				Err#32 EPIPE
> 4037.3055	    Received signal #13, SIGPIPE [caught]
> 4037.3064	sigaction(SIGUSR1, 0xFFBE5838, 0xFFBE58B8)	= 0
> 4037.3071	sigaction(SIGUSR2, 0xFFBE5838, 0xFFBE58B8)	= 0
> 4037.3074	write(2, " r s y n c   e r r o r :".., 66)	= 66
> 4037.3084	mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFF310000
> 4037.3088	munmap(0xFF310000, 8192)			= 0
> 4037.3091	llseek(0, 0, SEEK_CUR)				= 583129
> 4037.3094	_exit(20)


> Content-Description: server rsync lsof
> COMMAND   PID USER   FD   TYPE        DEVICE   SIZE/OFF    NODE NAME
...
> rsync   27948 root    5u  IPv4 0x30000a59710 0t64477429     TCP hite.chelsea.net:rsync->corona.chelsea.net:51816 (ESTABLISHED)
> rsync   27948 root    6uw VREG         85,30          0  541128 /export (/dev/md/dsk/d30)
> rsync   27948 root    7r  VREG        32,144   19062784 5347011 /export/backup -- export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/cyberounds.com/view.idx

> Content-Description: server truss
...
> 3497.9255	lstat64("export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/resistance-forum-www.thebody.com/page.dat", 0xFFBEE370) = 0
> 3497.9262	open64("export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/resistance-forum-www.thebody.com/page.dat", O_RDONLY) = 7
> 3497.9869	read(7, "\002FFB7\0\0\0\0\002E0EF".., 196536)	= 196536
> poll(0xFFBEC6E0, 2, 60000)	(sleeping...)
> 4037.2043	poll(0xFFBEC6E0, 2, 60000)			= 0
> 4037.2049	poll(0xFFBEC6E0, 2, 60000)			= 1
> 4037.2052	write(5, "FC0F\0078E hB40698 .84EF".., 4096)	Err#32 EPIPE
> 4037.2816	    Received signal #13, SIGPIPE [caught]
> 4037.2843	sigaction(SIGUSR1, 0xFFBEC2B0, 0xFFBEC330)	= 0
> 4037.2848	sigaction(SIGUSR2, 0xFFBEC2B0, 0xFFBEC330)	= 0
> 4037.2851	getpid()					= 27948 [27130]
> 4037.3504	kill(28192, SIGUSR1)				= 0
> 4037.3508	getpid()					= 27948 [27130]
> 4037.3514	poll(0xFFBEABB0, 2, 60000)			= 1
> 4037.3518	write(5, " B\0\0\b r s y n c   e r".., 70)	Err#32 EPIPE
> 4037.3523	    Received signal #13, SIGPIPE [default]
> 	*** process killed ***


> Content-Description: server child lsof
> COMMAND   PID USER   FD   TYPE        DEVICE   SIZE/OFF    NODE NAME
...
> rsync   28192 root    5u  IPv4 0x30000a59710 0t64477429     TCP hite.chelsea.net:rsync->corona.chelsea.net:51816 (ESTABLISHED)
> rsync   28192 root    6u  VREG         85,30          0  541128 /export (/dev/md/dsk/d30)
> rsync   28192 root    7r  VREG        32,144   51363840 5347010 /export/backup -- export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/cyberounds.com/view.dat
> rsync   28192 root    8u  unix        105,36        0t0  306297 /devices/pseudo/tl at 0:ticots->(socketpair: 0x18d400000000) (0x300016fd448)
> rsync   28192 root    9u  VREG        32,144   37569724 5347009 /export/backup -- export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/cyberounds.com/.view.dat.Q9aae3
> rsync   28192 root   10u  unix        105,39        0t0  306297 /devices/pseudo/tl at 0:ticots->(socketpair: 0x18d600000000) (0x300016fc368)

> Content-Description: server child truss
...
> 3504.2667	poll(0xFFBECF20, 1, 60000)			= 1
> 3504.2682	read(5, "D2\t\0\0", 4)				= 4
> 3504.2699	poll(0xFFBECF98, 1, 60000)			= 1
> 3504.2714	read(5, " I E   6 . 0 ;   W i n d".., 2514)	= 2514
> 3504.2738	write(9, " I E   6 . 0 ;   W i n d".., 1024)	= 1024
> 3504.2755	write(9, " . 0 ;   W i n d o w s  ".., 1024)	= 1024
> 3504.2771	write(9, " w e r P C )\0 M9BCE9A 1".., 466)	= 466
> 3504.2787	poll(0xFFBECF20, 1, 60000)			= 1
> 3504.2800	read(5, "E4E6FFFF", 4)				= 4
> 3504.2822	llseek(7, 5898240, SEEK_SET)			= 5898240
> 3517.5204	read(7, "\0 Y  \0\0\0\0\0\0 E16C5".., 262144)	= 262144
> 4037.2227	write(9, " d o w s   9 5 )\0 T\v C".., 928)	= 928
> 4037.2233	poll(0xFFBECF20, 1, 60000)			= 1
> 4037.2238	read(5, "E3E6FFFF", 4)				= 4
> 4037.2244	write(9, " 4 . 0   ( c o m p a t i".., 928)	= 928
> 4037.2258	poll(0xFFBECF20, 1, 60000)			= 1
> 4037.2262	read(5, "E2E6FFFF", 4)				= 4
> 4037.2269	write(9, " c o m : M o z i l l a /".., 928)	= 928
> 4037.2274	poll(0xFFBECF20, 1, 60000)			= 1
> 4037.2278	read(5, "0E1A\0\0", 4)				= 4
> 4037.2283	poll(0xFFBECF98, 1, 60000)			= 1
> 4037.2288	read(5, " M o z i l l a / 4 . 0  ".., 6670)	= 6670
> 4037.2304	write(9, " M o z i l l a / 4 . 0  ".., 1024)	= 1024
> 4037.3011	write(9, " . c o m : M o z i l l a".., 1024)	= 1024
> 4037.3017	write(9, " 2 4 7 - 2 9 - 0 . c l i".., 1024)	= 1024
> 4037.3022	write(9, " 1 ;   Q 3 1 2 4 6 1 )\0".., 1024)	= 1024
> 4037.3028	write(9, " M S I E C r a w l e r )".., 1024)	= 1024
> 4037.3079	llseek(9, 1, SEEK_CUR)				= 585779
> 4037.3085	write(9, " WB7 lA0 1 2 - 2 5 5 - 1".., 1023)	= 1023
> 4037.3125	write(9, " 0 7 2 1 2 0 0 0 h . 0 8".., 526)	= 526
> 4037.3636	poll(0xFFBECF20, 1, 60000)			= 1
> 4037.3642	read(5, " %DFFFFF", 4)				= 4
> 4037.3650	llseek(7, 7733248, SEEK_SET)			= 7733248
> 4037.6213	read(7, "\01A @\0\01A  \0\0 6109E".., 262144)	= 262144
> 4037.6229	    Received signal #16, SIGUSR1 [caught]
>       siginfo: SIGUSR1 pid=27948 uid=0
> 4037.6234	sigaction(SIGUSR1, 0xFFBED188, 0xFFBED208)	= 0
> 4037.6238	sigaction(SIGUSR2, 0xFFBED188, 0xFFBED208)	= 0
> 4037.6913	unlink("export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/oral-forum-www.thebody.com/.visitorid.idx.Lkbae3") = 0
> 4037.6921	getpid()					= 28192 [1]
> 4037.6957	write(10, " A\0\0\b r s y n c   e r".., 69)	Err#32 EPIPE
> 4037.6977	    Received signal #13, SIGPIPE [caught]
> 4037.6982	sigaction(SIGUSR1, 0xFFBEC8B0, 0xFFBEC930)	= 0
> 4037.7016	sigaction(SIGUSR2, 0xFFBEC8B0, 0xFFBEC930)	= 0
> 4037.7021	write(10, " A\0\0\b r s y n c   e r".., 69)	Err#32 EPIPE
> 4037.7026	    Received signal #13, SIGPIPE [default]
> 	*** process killed ***





More information about the rsync mailing list