SIGUSR1 or SIGINT error
Dave Dykstra
dwd at bell-labs.com
Sat Feb 23 02:53:14 EST 2002
[adding the mailing list to the Cc after David sent me trusses and other
info about his hang. I have trimmed down the tracing info to the relevant
pieces.]
On Tue, Feb 19, 2002 at 11:51:27PM -0500, David Birnbaum wrote:
> I wasn't able to get a netstat this time. However, I should be able to
> grab it again if these truss's don't help. I'm thinking I will run a
> snoop the next time and capture the packet traffic to see what's happening
> at the TCP level - at least we should be able to see who/what shut down
> first, because from the truss, it looks like the kernel shut down the
> whole thing at once. It's tough to time the netstat because it takes a
> while for it to die.
I think the trusses with timestamps showed me enough.
> Trusses, lsof, netstats attached. Thanks for the help; I appreciate the
> efforts.
The client is doing nothing complex, it's just merrilly blasting away down
file descriptor 4 and never reading it. It then hangs and the only reason
it wakes up is because the connection has been torn down.
The server process looks very odd. It is doing a 196K read from a file, and
it takes 9 minutes to return from that, and when it wakes up from that it
finds that an EPIPE from the connection to the rsync client. Why would a
196K read take 9 minutes? It looks like it is going over NFS, so maybe the
problem is with NFS. What's the network between the NFS server and the
rsync server? Is it possible to run rsync on the NFS server itself, or is
that a network appliance or something? It could be that the TCP
implementation on the client side wasn't willing to wait for a response for
9 minutes and that's why it shut things down with an EPIPE.
At the same time, the server child process is doing a 262K read from another
file over the same NFS connection and it takes the same 9 minutes and
eventually suceeds. It only dies because the other server process had died
and it gets a signal.
> David.
- Dave Dykstra
> Content-Description: client lsof
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
...
> rsync 20039 root 4u IPv4 0x3000dbd3810 0t64514407 TCP corona.chelsea.net:51816->hite.chelsea.net:rsync (ESTABLISHED)
> rsync 20039 root 5r VREG 32,359 52273152 243313 /export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/cyberounds.com/view.dat
> Content-Description: client truss
...
> 3504.6000 write(4, " w s 9 8 ; V Z _ I E".., 1021) = 1021
> 3504.6003 poll(0xFFBE5CA0, 1, 60000) = 1
> 3504.6005 write(4, "03FCFFFF", 4) = 4
> 3504.6010 poll(0xFFBE5CA0, 1, 60000) = 1
> 3504.6012 write(4, "02FCFFFF", 4) = 4
> 3504.6016 poll(0xFFBE5CA0, 1, 60000) = 1
> 3504.6019 write(4, "01FCFFFF", 4) = 4
> 3504.6034 poll(0xFFBE5CA0, 1, 60000) = 1
> 3504.6036 write(4, "E306\0\0", 4) = 4
> 3504.6041 poll(0xFFBE5D18, 1, 60000) = 1
> 3504.6046 write(4, " s t 1 0 3 . t n t 4 . r".., 1763) = 1763
> poll(0xFFBE5CA0, 1, 60000) (sleeping...)
> 3564.6045 poll(0xFFBE5CA0, 1, 60000) = 0
> poll(0xFFBE5CA0, 1, 60000) (sleeping...)
> 3624.6045 poll(0xFFBE5CA0, 1, 60000) = 0
> poll(0xFFBE5CA0, 1, 60000) (sleeping...)
> 3684.6045 poll(0xFFBE5CA0, 1, 60000) = 0
> poll(0xFFBE5CA0, 1, 60000) (sleeping...)
> 3744.6046 poll(0xFFBE5CA0, 1, 60000) = 0
> poll(0xFFBE5CA0, 1, 60000) (sleeping...)
> 3804.6046 poll(0xFFBE5CA0, 1, 60000) = 0
> poll(0xFFBE5CA0, 1, 60000) (sleeping...)
> 3864.6044 poll(0xFFBE5CA0, 1, 60000) = 0
> poll(0xFFBE5CA0, 1, 60000) (sleeping...)
> 3924.6041 poll(0xFFBE5CA0, 1, 60000) = 0
> poll(0xFFBE5CA0, 1, 60000) (sleeping...)
> 3984.6041 poll(0xFFBE5CA0, 1, 60000) = 0
> poll(0xFFBE5CA0, 1, 60000) (sleeping...)
> 4037.3040 poll(0xFFBE5CA0, 1, 60000) = 1
> 4037.3046 write(4, "FEFBFFFF", 4) Err#32 EPIPE
> 4037.3055 Received signal #13, SIGPIPE [caught]
> 4037.3064 sigaction(SIGUSR1, 0xFFBE5838, 0xFFBE58B8) = 0
> 4037.3071 sigaction(SIGUSR2, 0xFFBE5838, 0xFFBE58B8) = 0
> 4037.3074 write(2, " r s y n c e r r o r :".., 66) = 66
> 4037.3084 mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFF310000
> 4037.3088 munmap(0xFF310000, 8192) = 0
> 4037.3091 llseek(0, 0, SEEK_CUR) = 583129
> 4037.3094 _exit(20)
> Content-Description: server rsync lsof
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
...
> rsync 27948 root 5u IPv4 0x30000a59710 0t64477429 TCP hite.chelsea.net:rsync->corona.chelsea.net:51816 (ESTABLISHED)
> rsync 27948 root 6uw VREG 85,30 0 541128 /export (/dev/md/dsk/d30)
> rsync 27948 root 7r VREG 32,144 19062784 5347011 /export/backup -- export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/cyberounds.com/view.idx
> Content-Description: server truss
...
> 3497.9255 lstat64("export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/resistance-forum-www.thebody.com/page.dat", 0xFFBEE370) = 0
> 3497.9262 open64("export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/resistance-forum-www.thebody.com/page.dat", O_RDONLY) = 7
> 3497.9869 read(7, "\002FFB7\0\0\0\0\002E0EF".., 196536) = 196536
> poll(0xFFBEC6E0, 2, 60000) (sleeping...)
> 4037.2043 poll(0xFFBEC6E0, 2, 60000) = 0
> 4037.2049 poll(0xFFBEC6E0, 2, 60000) = 1
> 4037.2052 write(5, "FC0F\0078E hB40698 .84EF".., 4096) Err#32 EPIPE
> 4037.2816 Received signal #13, SIGPIPE [caught]
> 4037.2843 sigaction(SIGUSR1, 0xFFBEC2B0, 0xFFBEC330) = 0
> 4037.2848 sigaction(SIGUSR2, 0xFFBEC2B0, 0xFFBEC330) = 0
> 4037.2851 getpid() = 27948 [27130]
> 4037.3504 kill(28192, SIGUSR1) = 0
> 4037.3508 getpid() = 27948 [27130]
> 4037.3514 poll(0xFFBEABB0, 2, 60000) = 1
> 4037.3518 write(5, " B\0\0\b r s y n c e r".., 70) Err#32 EPIPE
> 4037.3523 Received signal #13, SIGPIPE [default]
> *** process killed ***
> Content-Description: server child lsof
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
...
> rsync 28192 root 5u IPv4 0x30000a59710 0t64477429 TCP hite.chelsea.net:rsync->corona.chelsea.net:51816 (ESTABLISHED)
> rsync 28192 root 6u VREG 85,30 0 541128 /export (/dev/md/dsk/d30)
> rsync 28192 root 7r VREG 32,144 51363840 5347010 /export/backup -- export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/cyberounds.com/view.dat
> rsync 28192 root 8u unix 105,36 0t0 306297 /devices/pseudo/tl at 0:ticots->(socketpair: 0x18d400000000) (0x300016fd448)
> rsync 28192 root 9u VREG 32,144 37569724 5347009 /export/backup -- export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/cyberounds.com/.view.dat.Q9aae3
> rsync 28192 root 10u unix 105,39 0t0 306297 /devices/pseudo/tl at 0:ticots->(socketpair: 0x18d600000000) (0x300016fc368)
> Content-Description: server child truss
...
> 3504.2667 poll(0xFFBECF20, 1, 60000) = 1
> 3504.2682 read(5, "D2\t\0\0", 4) = 4
> 3504.2699 poll(0xFFBECF98, 1, 60000) = 1
> 3504.2714 read(5, " I E 6 . 0 ; W i n d".., 2514) = 2514
> 3504.2738 write(9, " I E 6 . 0 ; W i n d".., 1024) = 1024
> 3504.2755 write(9, " . 0 ; W i n d o w s ".., 1024) = 1024
> 3504.2771 write(9, " w e r P C )\0 M9BCE9A 1".., 466) = 466
> 3504.2787 poll(0xFFBECF20, 1, 60000) = 1
> 3504.2800 read(5, "E4E6FFFF", 4) = 4
> 3504.2822 llseek(7, 5898240, SEEK_SET) = 5898240
> 3517.5204 read(7, "\0 Y \0\0\0\0\0\0 E16C5".., 262144) = 262144
> 4037.2227 write(9, " d o w s 9 5 )\0 T\v C".., 928) = 928
> 4037.2233 poll(0xFFBECF20, 1, 60000) = 1
> 4037.2238 read(5, "E3E6FFFF", 4) = 4
> 4037.2244 write(9, " 4 . 0 ( c o m p a t i".., 928) = 928
> 4037.2258 poll(0xFFBECF20, 1, 60000) = 1
> 4037.2262 read(5, "E2E6FFFF", 4) = 4
> 4037.2269 write(9, " c o m : M o z i l l a /".., 928) = 928
> 4037.2274 poll(0xFFBECF20, 1, 60000) = 1
> 4037.2278 read(5, "0E1A\0\0", 4) = 4
> 4037.2283 poll(0xFFBECF98, 1, 60000) = 1
> 4037.2288 read(5, " M o z i l l a / 4 . 0 ".., 6670) = 6670
> 4037.2304 write(9, " M o z i l l a / 4 . 0 ".., 1024) = 1024
> 4037.3011 write(9, " . c o m : M o z i l l a".., 1024) = 1024
> 4037.3017 write(9, " 2 4 7 - 2 9 - 0 . c l i".., 1024) = 1024
> 4037.3022 write(9, " 1 ; Q 3 1 2 4 6 1 )\0".., 1024) = 1024
> 4037.3028 write(9, " M S I E C r a w l e r )".., 1024) = 1024
> 4037.3079 llseek(9, 1, SEEK_CUR) = 585779
> 4037.3085 write(9, " WB7 lA0 1 2 - 2 5 5 - 1".., 1023) = 1023
> 4037.3125 write(9, " 0 7 2 1 2 0 0 0 h . 0 8".., 526) = 526
> 4037.3636 poll(0xFFBECF20, 1, 60000) = 1
> 4037.3642 read(5, " %DFFFFF", 4) = 4
> 4037.3650 llseek(7, 7733248, SEEK_SET) = 7733248
> 4037.6213 read(7, "\01A @\0\01A \0\0 6109E".., 262144) = 262144
> 4037.6229 Received signal #16, SIGUSR1 [caught]
> siginfo: SIGUSR1 pid=27948 uid=0
> 4037.6234 sigaction(SIGUSR1, 0xFFBED188, 0xFFBED208) = 0
> 4037.6238 sigaction(SIGUSR2, 0xFFBED188, 0xFFBED208) = 0
> 4037.6913 unlink("export/httpd/DOMAINS/stats.chelsea.net/data/NetTracker/oral-forum-www.thebody.com/.visitorid.idx.Lkbae3") = 0
> 4037.6921 getpid() = 28192 [1]
> 4037.6957 write(10, " A\0\0\b r s y n c e r".., 69) Err#32 EPIPE
> 4037.6977 Received signal #13, SIGPIPE [caught]
> 4037.6982 sigaction(SIGUSR1, 0xFFBEC8B0, 0xFFBEC930) = 0
> 4037.7016 sigaction(SIGUSR2, 0xFFBEC8B0, 0xFFBEC930) = 0
> 4037.7021 write(10, " A\0\0\b r s y n c e r".., 69) Err#32 EPIPE
> 4037.7026 Received signal #13, SIGPIPE [default]
> *** process killed ***
More information about the rsync
mailing list