probbable generator hang bug in rsync

Steve Sether steve at vellmont.com
Mon Oct 3 19:06:03 GMT 2005


Here's the strace output:  I'm not terribly familiar with system
programming, so I'm unsure how to interpret it.

write(4, "2005/10/03 13:58:12 [16992] recv"..., 137) = 137
select(8, [7], [3], NULL, {60, 0})      = 1 (in [7], left {59, 660000})
select(8, [7], [], NULL, {60, 0})       = 1 (in [7], left {60, 0})
read(7, "m\0\0\n", 4)                   = 4
select(8, [7], [], NULL, {60, 0})       = 1 (in [7], left {60, 0})
read(7, "recv localhost.localdomain [127."..., 109) = 109
getpid()                                = 16992
time(NULL)                              = 1128365892
write(4, "2005/10/03 13:58:12 [16992] recv"..., 137) = 137
select(8, [7], [3], NULL, {60, 0})      = 1 (in [7], left {58, 460000})
select(8, [7], [], NULL, {60, 0})       = 1 (in [7], left {60, 0})
read(7, "2\0\0\10", 4)                  = 4
select(8, [7], [], NULL, {60, 0})       = 1 (in [7], left {60, 0})
read(7, "rsync: read error: Connection re"..., 50) = 50
getpid()                                = 16992
time(NULL)                              = 1128365894
write(4, "2005/10/03 13:58:14 [16992] rsyn"..., 78) = 78
select(4, NULL, [3], NULL, {60, 0})     = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, NULL, WNOHANG, NULL)          = 17007
wait4(-1, NULL, WNOHANG, NULL)          = -1 ECHILD (No child processes)
rt_sigaction(SIGCHLD, {0x401ddb30, [CHLD], SA_RESTORER|SA_RESTART, 0x400c1978}, {0x401ddb30, [CHLD], SA_RESTORER|SA_RESTART, 0x400c1978}, 8) = 0
sigreturn()                             = ? (mask now [RTMIN])
select(4, NULL, [3], NULL, {60, 0})     = 0 (Timeout)
select(4, NULL, [3], NULL, {60, 0})     = 0 (Timeout)


On Sat, Sep 24, 2005 at 12:18:35AM -0700, Wayne Davison wrote:
> On Wed, Sep 21, 2005 at 10:25:41PM -0500, Steve Sether wrote:
> > I have about 10 modules configured and I still get this problem.
> > Any advice on finding out what's going on Wayne?
> 
> I'd suggest that you get a system-call trace of the generator that
> covers the period of activity during the socket closing.  We need to see
> if the generator gets any kind of a notification that the socket is now
> closed.  What should happen is that the select() call should return that
> the socket's fd is now ready for a write() call, and that write() call
> should return an EOF error.  If the select() doesn't ever wake up, that
> would seem to indicate that the socket going to rsync is really still
> open (perhaps because stunnel didn't close the local socket yet).  Or,
> if the select() call did wake up, perhaps the write() call returned a
> try-again error (such as EAGAIN) instead of an EOF error.  If so, that
> might indicate an OS bug.
> 
> ..wayne..


More information about the rsync mailing list