rsync hang, more details [LONG]

Eric Whiting ewhiting at amis.com
Tue Dec 18 03:38:34 EST 2001


I'm running 2.5.1pre3 and seeing lots hangs as well. Under
2.4.6+Waynes_nohang, I didn't have trouble this bad before. 

SRC: solaris 2.7, netapps nfs tree
DST: solaris 2.8, linux 2.[2,4].*
TRANSPORT: ssh

This setup has worked well for months before the upgrade to 2.5.1.pre3.

I have not tried the -vvv. I'll try that and see what it does.

Sure seems like a timing problem. 

eric





Ed Santiago wrote:
> 
> rsync 2.5.0 still has a bug where it hangs under some circumstances.
> 
> The hang is beyond my abilities to track down.  I'll keep trying,
> though, but here are details in case they're of use to anyone else:
> 
>   - Code configured & built on Solaris 2.5.1.
>   - Same binary run on Solaris 2.5.1 (client) and 2.8 (server).
>   - Using rsh transport, but also fails with ssh
>   - Does not fail with local-local rsync
> 
>   - Source directory (on server) is NFS-mounted, from NetApp filer
>   - Destination directory (on client) is local (tested NFS, also hangs)
> 
>   - Consistently hangs with -vv, never (so far) with -vvv
> 
> Included below are three stack traces, one on the server and two
> on the client.  This is a pretty consistent feature: The client
> and server appear to be deadlocked waiting for each other.
> 
> Also attached below are a script for populating a sample hierarchy,
> and the rsync invocation.
> 
> Backtrace on server:
> 
>   #0  0xff218224 in _poll ()
>   #1  0xff1cb808 in _select ()
>   #2  0x24bec in writefd_unbuffered (fd=1, buf=0xffbe5ed0 ">", len=66)
>       at io.c:406
>   #3  0x24eac in mplex_write (fd=1, code=62, buf=0x591d8 "\a\020", len=62)
>       at io.c:498
>   #4  0x24f24 in io_flush () at io.c:518
>   #5  0x24940 in readfd (fd=0, buffer=0xffbe7020 "ï\002r\215©/\201R", N=4)
>       at io.c:314
>   #6  0x24998 in read_int (f=0) at io.c:329
>   #7  0x199a4 in send_files (flist=0x574e8, f_out=1, f_in=0) at sender.c:110
>   #8  0x1d1e8 in do_server_sender (f_in=0, f_out=1, argc=1, argv=0x56f74)
>       at main.c:300
>   #9  0x1d708 in start_server (f_in=0, f_out=1, argc=2, argv=0x56f70)
>       at main.c:476
>   #10 0x1e08c in main (argc=2, argv=0x56f70) at main.c:838
> 
> Backtrace #1 on client (the parent):
> 
>   #0  0xef5b7904 in _poll ()
>   #1  0xef5d3d40 in _select ()
>   #2  0x24644 in read_timeout (fd=6, buf=0xeffff348 "ÿÿÿÿ", len=4) at io.c:191
>   #3  0x247dc in read_unbuffered (fd=6, buf=0xeffff348 "ÿÿÿÿ", len=4) at io.c:263
>   #4  0x24950 in readfd (fd=6, buffer=0xeffff348 "ÿÿÿÿ", N=4) at io.c:316
>   #5  0x24998 in read_int (f=6) at io.c:329
>   #6  0x184e8 in generate_files (f=5, flist=0x57520, local_name=0x0, f_recv=6)
>       at generator.c:471
>   #7  0x1d3fc in do_recv (f_in=4, f_out=5, flist=0x57520, local_name=0x0)
>       at main.c:379
>   #8  0x1d958 in client_run (f_in=4, f_out=5, pid=22226, argc=1, argv=0x56f74)
>       at main.c:558
>   #9  0x1ddc0 in start_client (argc=1, argv=0x56f74) at main.c:731
>   #10 0x1e098 in main (argc=2, argv=0x56f70) at main.c:841
> 
> Backtrace #2 on client (child):
> 
>   #0  0xef5b7904 in _poll ()
>   #1  0xef5d3d40 in _select ()
>   #2  0x24644 in read_timeout (fd=4, buf=0xefffe680 "", len=4) at io.c:191
>   #3  0x24788 in read_loop (fd=4, buf=0xefffe680 "", len=4) at io.c:242
>   #4  0x24824 in read_unbuffered (fd=4, buf=0xefffe680 "", len=4) at io.c:268
>   #5  0x24950 in readfd (fd=4, buffer=0xefffe680 "", N=4) at io.c:316
>   #6  0x24998 in read_int (f=4) at io.c:329
>   #7  0x18eec in recv_files (f_in=4, flist=0x57520, local_name=0x0, f_gen=8)
>       at receiver.c:328
>   #8  0x1d374 in do_recv (f_in=4, f_out=5, flist=0x57520, local_name=0x0)
>       at main.c:357
>   #9  0x1d958 in client_run (f_in=4, f_out=5, pid=22226, argc=1, argv=0x56f74)
>       at main.c:558
>   #10 0x1ddc0 in start_client (argc=1, argv=0x56f74) at main.c:731
>   #11 0x1e098 in main (argc=2, argv=0x56f70) at main.c:841
> 
> --------
> 
> The above rsync compiled from CVS repository on Thursday 13 Dec,
> early AM.  I've just now (Mon 17 Dec 08:19 Mountain Time) updated,
> rebuilt, and rerun the tests.  Same hang.
> 
> The script below can be used to populate a directory hierarchy.  It
> creates subdirectories 00 through 99 under "src-test/CVSROOT"
> (up to you to mkdir that), then a number of files in each subdir:
> 
>   ------------------------------------------------------------------------
> #!/sw/tools/bin/Perl -w
> 
> use strict;
> 
> # up to caller to do:   mkdir -p src-test/CVSROOT
> my $sub = 'src-test/CVSROOT';
> -d $sub
>   or die "You're cd'ed to the wrong directory\n";
> 
> foreach my $i (0..99) {
>   my $d = sprintf("%02d", $i);
>   mkdir "$sub/$d", 02775;
> 
>   foreach my $j (1..99) {
>     my $f = "$sub/$d/$j$d";
>     open  OUT, '>', $f;
>     print OUT $f, "\n";
>     close OUT or die "error writing $f: $!\n";
>   }
> }
> 
>   ------------------------------------------------------------------------
> This is the rsync invocation:
> 
>   ------------------------------------------------------------------------
> #!/bin/sh
> 
> CMD=/home/santiago/src/rsync/rsync/rsync.solaris
> 
> $CMD    -z -avv --stats --delete \
>         --rsync-path=$CMD.nopur                                 \
>         --timeout=600                                           \
>         "cvsroot.eng.ascend.com:/home/santiago/tmp/rsync-test/src-test/CVSROOT" ./results
> 
>   ------------------------------------------------------------------------
> Thanks in advance for any help,
> ^E
> --
> Ed Santiago                 Toolsmith                 santiago at ascend.com




More information about the rsync mailing list