Rsync 2.5.5 hangs for no apparent reason

John D Stephens jstephens at ti.com
Tue Aug 13 08:40:06 EST 2002


Thanks in advance for taking the time to read this email.

I'm using rsync to make a copy of my Netbackup Catalogs
to an offsite server for DRP.  I have narrowed the problem
down to a few sub-directories that contains 16 directories
each and in each directory there are about 37 files. None
of the files are larger than 300MB.  In fact, most of them
are around 1K-5MB.   The problem is, rsync will hang with no output 
to the offsite server within 15 seconds of issuing this command
inside of a  /bin/sh  script.  The script is running as root.

/usr/local/bin/rsync -avz --bwlimit=3096 --stats --timeout=600
  --delete-after --no-detach --numeric-ids --rsync-path=/usr/local/bin/rsync
  /usr/openv/netbackup/db/images/cave/ drphost.ti.com:/export/d2/nbumaster/
  openv/netbackup/db/images/cave/  > /tmp/log 2>&1

When I use rsync to sync other image directories, I don't have any 
problems.  I have taken the catalogs off-line (un-mounted) and ran
fsck on the RAID with no problems.  So I don't believe it is an inode problem.
Since I put the --timeout=600, rsync will now die, but before I did this,
rsync would hang for days until I killed it, on both servers.

Have any of you seen this problem before?  
Do you have any suggestions that could help me debug further?
Would using rsync as a daemon help? (instead of rsh)
How do you setup rsync as a daemon? (rsync.conf ?)


Here is the last output of a truss command showing how rsync hangs:
------------------------------------------
22436:  poll(0xFFBE6C78, 1, 600000)                     = 1
22436:  read(7, " 2\0\0\t", 4)                          = 4
22436:  time()                                          = 1029194645
22436:  poll(0xFFBE6C78, 1, 600000)                     = 1
22436:  read(7, " 1 0 2 0 0 0 0 0 0 0 / C".., 50)       = 50
22436:  time()                                          = 1029194645
22436:  write(1, " 1 0 2 0 0 0 0 0 0 0 / C".., 50)      = 50
22436:  poll(0xFFBE6C78, 1, 600000)                     = 1
22436:  read(7, " =\0\0\t", 4)                          = 4
22436:  time()                                          = 1029194645
22436:  poll(0xFFBE6C78, 1, 600000)                     = 1
22436:  read(7, " r e c v _ g e n e r a t".., 61)       = 61
22436:  time()                                          = 1029194645
22436:  write(1, " r e c v _ g e n e r a t".., 61)      = 61
22436:  poll(0xFFBE6C78, 1, 600000)     (sleeping...)
22438:  poll(0xFFBEF468, 2, -1)         (sleeping...)
------------------------------------------
And it just keeps on sleeping...........


Here is the output of a snoop command listening for drphost.
------------------------------------------
nbumaster -> drphost.ti.com RSHELL C port=1017 
drphost.ti.com -> nbumaster RSHELL R port=1017 \27\0\0\trecv_generator(.
nbumaster -> drphost.ti.com RSHELL C port=1017 
drphost.ti.com -> nbumaster RSHELL R port=1017 6\0\0\t1019000000/Cave-
nbumaster -> drphost.ti.com RSHELL C port=1017 
drphost.ti.com -> nbumaster RSHELL R port=1017 9\0\0\trecv_generator(1
nbumaster -> drphost.ti.com RSHELL C port=1017 
nbumaster -> drphost.ti.com RSHELL C port=1023 
drphost.ti.com -> nbumaster RSHELL R port=1023 
drphost.ti.com -> nbumaster RSHELL R port=1023 a
nbumaster -> drphost.ti.com RSHELL C port=1023 
-----------------------------------------
Why did it switch ports here?  Is it suppose to?


Here is the output of the last few lines of /tmp/log.
-------------------------------
1020000000/Cave-v2-v3_1020169327_FULL is uptodate
recv_generator(1020000000/Cave-v2-v3_1020169327_FULL.f.Z,10)
1020000000/Cave-v2-v3_1020169327_FULL.f.Z is uptodate
recv_generator(1020000000/Cave-v2-v3_1020651001_FULL,11)
1020000000/Cave-v2-v3_1020651001_FULL is uptodate
recv_generator(1020000000/Cave-v2-v3_1020651001_FULL.f.Z,12)
1020000000/Cave-v2-v3_1020651001_FULL.f.Z is uptodate
recv_generator(1020000000/Cave-v4-v5_1020737309_FULL,13)
1020000000/Cave-v4-v5_1020737309_FULL is uptodate
recv_generator(1020000000/Cave-v4-v5_1020737309_FULL.f.Z,14)
io timeout after 600 seconds - exiting
rsync error: timeout in data send/receive (code 30) at io.c(85)
_exit_cleanup(code=30, file=io.c, line=85): about to call exit(30)



Thanks for any help you can give me.

Regards
John Stephens
-- 
  +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
 + John D Stephens    ITS Design Systems   +
+ Texas Instruments  12500 TI BLVD, Dallas  +
 + jstephens at ti.com     214-480-6229       +
  +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+



More information about the rsync mailing list