rsync stuck after correctly sending the file

Loïc Le Loarer loic_leloarer at sdesigns.eu
Thu Nov 30 14:49:12 GMT 2006


Hi all,

I'm using rsync to replicate a cvs repository, it is working quite good 
except that I occassionnaly see rsync stucked after having transfered a 
file, it is always on a big ,v file containing multiple version of big 
binary file.

I haven't gathered the netstat information on the frozen process (it is 
not easy to reproduce the problem), but I have the strace information :
- the server process is blocked on select system call "select(8, [7], 
[], NULL, {60, 0}) = 0 (Timeout)"
- the two clients process are also stuck on the same system call.

I'm using rsync 2.6.8, and I give bellow extract of the strace (the 
complete strace are huge).

Is it enough information to have an idea of the problem ?

What else could I do to help debugging this issue ?

Thank you for your help.
-- 
Loïc Le Loarer


Here is an extract of the strace of the client process with the highest 
pid :
11:20:42 open("toto.gz,v", O_RDONLY)                      = 1
11:20:42 fstat(1, {st_mode=S_IFREG|0444, st_size=1785405919, ...}) = 0
11:20:42 open(".toto.gz,v.3axXXh", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
11:20:42 fchmod(3, 0400)                = 0
....... file transfert, many read and write
11:48:48 read(0, "\24.........", 8184) = 387
11:48:48 write(3, "%I\17H/M\...\226"..., 254657) = 254657
11:48:48 brk(0x588000)                  = 0x588000
11:48:48 close(1)                       = 0
11:48:48 close(3)                       = 0
11:48:48 lstat(".toto.gz,v.3axXXh", {st_mode=S_IFREG|0400, 
st_size=2039997121
, ...}) = 0
11:48:48 chmod(".toto.gz,v.3axXXh", 0444) = 0
11:48:48 rename(".toto.gz,v.3axXXh", "toto.gz,v") = 0
11:48:50 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
11:49:50 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
11:50:50 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)
... an so on until I kill it
13:28:51 select(1, [0], [], NULL, {60, 0}) = ? ERESTARTNOHAND (To be 
restarted)
13:29:19 --- SIGTERM (Terminated) @ 0 (0) ---
13:29:19 select(0, NULL, NULL, NULL, {0, 400000}) = 0 (Timeout)
13:29:20 select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
13:29:20 rt_sigaction(SIGUSR1, {SIG_IGN}, NULL, 8) = 0
13:29:20 rt_sigaction(SIGUSR2, {SIG_IGN}, NULL, 8) = 0
13:29:20 write(4, "\\\0\0\10rsync error: received SIGINT, SIGTERM, or 
SIGHUP (code
20) at rsync.c(242) [receiver=2.6.8]\n", 96) = 96
13:29:20 exit_group(20)                 = ?

Here is an extract of the strace of the server process :
02:48:48 read(7, "\374\17\0\7", 4)      = 4
02:48:48 select(8, [7], [], NULL, {60, 0}) = 1 (in [7], left {60, 0})
02:48:48 read(7, "\366....\315"..., 4092) = 1232
02:48:48 select(8, [7], [], NULL, {60, 0}) = 0 (Timeout)
02:49:48 select(8, [7], [], NULL, {60, 0}) = 0 (Timeout)
02:50:48 select(8, [7], [], NULL, {60, 0}) = 0 (Timeout)
... an so on until I kill it
04:28:47 select(8, [7], [], NULL, {60, 0}) = 1 (in [7], left {27, 450000})
04:29:20 read(7, "\\\0\0\10rsync error: received SIGINT, SIGTERM, or 
SIGHUP (code 2
0) at rsync.c(242) [receiver=2.6.8]\n", 2860) = 96
04:29:20 select(8, [7], [], NULL, {60, 0}) = 1 (in [7], left {59, 990000})
04:29:20 read(7, "N\0\0\10rsync: connection unexpectedly closed (112 
bytes received
  so far) [generator]\nZ\0\0\10rsync error: error in rsync protocol data 
stream (cod
e 12) at io.c(463) [generator=2.6.8]\n", 2764) = 176
04:29:20 select(8, [7], [], NULL, {60, 0}) = 1 (in [7], left {60, 0})
04:29:20 --- SIGCHLD (Child exited) @ 0 (0) ---
04:29:20 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 
29245
04:29:20 waitpid(-1, 0xbfff7ba4, WNOHANG) = -1 ECHILD (No child processes)
04:29:20 sigreturn()                    = ? (mask now [])
04:29:20 read(7, "", 2588)              = 0
04:29:20 write(2, "rsync: connection unexpectedly closed (356009 bytes 
received so
far) [sender]", 77) = 77
04:29:20 write(2, "\n", 1)              = 1
04:29:20 rt_sigaction(SIGUSR1, {SIG_IGN}, NULL, 8) = 0
04:29:20 rt_sigaction(SIGUSR2, {SIG_IGN}, NULL, 8) = 0
04:29:20 waitpid(29245, 0xbfff7e78, WNOHANG) = -1 ECHILD (No child 
processes)
04:29:20 getpid()                       = 29244
04:29:20 kill(29245, SIGUSR1)           = -1 ESRCH (No such process)
04:29:20 write(2, "rsync error: error in rsync protocol data stream 
(code 12) at io
.c(463) [sender=2.6.8]", 86) = 86
04:29:20 write(2, "\n", 1)              = 1
04:29:20 munmap(0xb75f3000, 4096)       = 0
04:29:20 exit_group(12)                 = ?



More information about the rsync mailing list