rsync hangs when tunneling... help!

Jay 'Whip' Grizzard elfchief+rsync at lupine.org
Wed Dec 15 21:52:04 GMT 2004


> Thanks--these have enough information for a cursory analysis, where we
> can see that the sender is using select to wait to write some data, and
> the receiver is using select to wait to read some data (and the
> generator is waiting to pump more checksum data to the sender).  Thus,
> everything looks fine.  To be able to know better, we'd need to see what
> numbered file descriptors each process was selecting on (to ensure that
> each one included the right descriptor for what we believe it to be
> doing).

Okay, let me see... 

Client side rsync process:

Stack, again: 

#0  0x280bf6c8 in select () from /usr/lib/libc.so.3
#1  0x8058124 in writefd_unbuffered (fd=6, buf=0x92f0000 "", len=32768)
    at io.c:865
#2  0x8058447 in writefd (fd=6, buf=0x92f0000 "", len=32768) at io.c:981
#3  0x8058551 in write_buf (f=6, buf=0x92f0000 "", len=32768) at io.c:1045
#4  0x8058ff8 in simple_send_token (f=6, token=-2, buf=0x807d0c0,
    offset=11567104, n=32768) at token.c:104
#5  0x80598c0 in send_token (f=6, token=-2, buf=0x807d0c0, offset=11567104,
    n=32768, toklen=0) at token.c:472


(gdb) frame 1


(gdb) print msg_fd_in
$7 = -1


(gdb) print w_fds
$8 = {fds_bits = {64, 0 <repeats 31 times>}}

(which, if my math is right, is fd6 (radix is decimal)... fd6 is the fd
 connecting rsync to stunnel, TCP localhost:1113->localhost:873





server-side, parent process:

top of stack:

#0  0xff19db44 in _poll () from /usr/lib/libc.so.1
#1  0xff15236c in _select () from /usr/lib/libc.so.1
#2  0x00025ee8 in writefd_unbuffered ()
#3  0x00025068 in io_flush ()
#4  0x00026a0c in writefd ()
#5  0x00026a80 in write_int ()
#6  0x00013570 in recv_generator ()


poll fds:

poll(0xFFBFC570, 2, 60000)      (sleeping...)
        fd=0  ev=POLLOUT rev=0
        fd=7  ev=POLLRDNORM rev=0

Those sockets are:

   0: S_IFSOCK mode:0666 dev:303,0 ino:26856 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
        sockname: AF_INET 127.0.0.1  port: 33305
        peername: AF_INET 127.0.0.1  port: 33304
	(presumably the connection to stunnel)

   7: S_IFSOCK mode:0666 dev:303,0 ino:15392 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
        sockname: AF_UNIX




server side client:

top of stack:

#0  0xff19db44 in _poll () from /usr/lib/libc.so.1
#1  0xff15236c in _select () from /usr/lib/libc.so.1
#2  0x000251e0 in read_timeout ()
#3  0x00025cc4 in readfd_unbuffered ()
#4  0x0002651c in read_buf ()
#5  0x00028058 in recv_token ()
#6  0x0001418c in receive_data ()



poll fds:

poll(0xFFBFCE50, 1, 60000)      (sleeping...)
        fd=0  ev=POLLRDNORM rev=0


fds:
   0: S_IFSOCK mode:0666 dev:303,0 ino:26856 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
        sockname: AF_INET 127.0.0.1  port: 33305
        peername: AF_INET 127.0.0.1  port: 33304




... so it looks like everyone is waiting for everyone else (best I can tell),
what isn't clear is /why/. It could be getting caught up in stunnel, but
I'm not sure why it would be -- I imagine if there was a problem with
buffering in the OpenSSL code it uses, someone would have noticed by now...
and the problem happens using TCP tunneling over ssh, too, so it's not
just stunnel specific. 

Thoughts?

-jay


More information about the rsync mailing list