rsync Failure

Matt McCutchen matt at mattmccutchen.net
Sat Dec 6 05:07:10 GMT 2008


On Fri, 2008-12-05 at 11:42 -0500, Shawn Geraghty wrote: 
> Upgrading will be done, but not really an option at this point, unless  
> of course, that is the fix.  I'm a limited resource and other stuff is  
> backing up.

I understand, but if the strace does indicate that you're hitting a hang
bug in rsync 2.6.9, I don't know that anyone will care to investigate
the bug unless it can be reproduced with the latest stable rsync.

> Here is an strace from server A.
> 
> 
> 11242 stat64("/etc/localtime", {st_mode=S_IFREG|0644,  
> st_size=3519, ...}) = 0
> 11242 write(3, "2008/12/05 08:04:55 [11242] recv"..., 92) = 92
> 11242 write(1, "recv_file_name(usr/share/icons/L"..., 64) = 64
> 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 11242 read(6, "9\0\0\t", 4)             = 4
> 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 11242 read(6, "make_file(1,usr/share/icons/Loco"..., 57) = 57
> 11242 time(NULL)                        = 1228482295
> 11242 stat64("/etc/localtime", {st_mode=S_IFREG|0644,  
> st_size=3519, ...}) = 0
> 11242 write(3, "2008/12/05 08:04:55 [11242] make"..., 85) = 85
> 11242 write(1, "make_file(1,usr/share/icons/Loco"..., 57) = 57
> 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 11242 read(6, "\f\0\0\7", 4)            = 4
> 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 11242 read(6, "\276&\5p.pngt\1\0\0", 12) = 12
> 11242 time(NULL)                        = 1228482295
> 11242 stat64("/etc/localtime", {st_mode=S_IFREG|0644,  
> st_size=3519, ...}) = 0
> 11242 write(3, "2008/12/05 08:04:55 [11242] recv"..., 88) = 88
> 11242 write(1, "recv_file_name(usr/share/icons/L"..., 60) = 60
> 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 11242 read(6, ":\0\0\t", 4)             = 4
> 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 11242 read(6, "make_file(1,usr/share/icons/Loco"..., 58) = 36
> 11242 select(7, [6], [], NULL, {60, 0}) = 0 (Timeout)
> 11242 select(7, [6], [], NULL, {60, 0}) = 0 (Timeout)
> 11242 select(7, [6], [], NULL, {60, 0}) = 0 (Timeout)
> [...]

So the receiver was waiting for the sender to send it the rest of the
(non-incremental) file list.  That's what it's supposed to do, so I need
to know what the sender was doing to diagnose the hang.

> The server being backed up doesn't run rsync as a daemon.  I've read  
> in the rsync docs that backups can be done initiated from the server  
> with out the client running rsync as a daemon.  For this case, I don't  
> think I can get an strace.  But, if possible, let me know and I will  
> track down how to do that.

You can, and you'll need to if we are to get to the root of the problem.
See the "rsync-debug" script described at:

http://rsync.samba.org/issues.html

> I also tried to back-up another server, and captured an strace from  
> Server A.  Here is that trace.
> 
> The big difference here was that I had to ^C the process after about  
> an hour.   Usually, it looks like ssh shuts down the tunnel after a   
> period of 15 minutes.
> 
> 4872  <... select resumed> )            = 2 (in [4], out [6])
> 4891  select(7, [6], [], NULL, {60, 0} <unfinished ...>
> 4872  write(6, "\374\17\0\7glade2\0\f.xpm\0\347images\0\6/yelp\0"...,  
> 16384 <unfinished ...>
> 4891  <... select resumed> )            = 1 (in [6], left {60, 0})
> 4872  <... write resumed> )             = 16384
> 4891  read(6,  <unfinished ...>
> 4872  read(4,  <unfinished ...>
> 4891  <... read resumed> "urrent.desktop\0\5win95.desktop\0\3756"...,  
> 4092) = 4092
> 4872  <... read resumed> "\243\36c\274aVv\324\16\307z 
> \347\17\274\247\34\331\327\333X\256\177\222\360\246\"\v\301\"pA%"...,  
> 8192) = 8192
> 4891  select(7, [6], [], NULL, {60, 0} <unfinished ...>
> 4872  select(8, [4 5], [], NULL, NULL <unfinished ...>
> 4891  <... select resumed> )            = 1 (in [6], left {60, 0})
> 4872  <... select resumed> )            = 1 (in [4])
> 4891  read(6,  <unfinished ...>
> 4872  read(4,  <unfinished ...>
> 4891  <... read resumed> "\374\17\0\7", 4) = 4
> 4872  <... read resumed> "\270\335\204#\373QM\217U\225K 
> \241\22\2750\357\256\2504\215\206L\255M\264\364\362\267\366D 
> \335\350"..., 8192) = 3040
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "t-preferences.glade2\0\1asciitable"..., 4092) = 4092
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "\374\17\0\7", 4)         = 4
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4872  select(8, [4 5], [6], NULL, NULL <unfinished ...>
> 4891  read(6,  <unfinished ...>
> 4872  <... select resumed> )            = 1 (out [6])
> 4891  <... read resumed> "glade2\0\f.xpm\0\347images\0\6/yelp\0\5/ 
> em"..., 4092) = 4092
> 4872  write(6, "\374\17\0\7ORWARDING_DENIED\0\1TP_DISABLE"..., 16384  
> <unfinished ...>
> 4891  select(7, [6], [], NULL, {60, 0} <unfinished ...>
> 4872  <... write resumed> )             = 16384
> 4891  <... select resumed> )            = 1 (in [6], left {60, 0})
> 4872  select(8, [4 5], [], NULL, NULL <unfinished ...>
> 4891  read(6, "\374\17\0\7", 4)         = 4
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "t-config-nfs.png\0\371test.py\0\0mainW"..., 4092) = 4092
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "\374\17\0\7", 4)         = 4
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "RWARDING_DENIED\0\1TP_DISABLED\0\3FA"..., 4092) = 4092
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "\374\17\0\7", 4)         = 4
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "NG_DENIED\0\1TP_DISABLED\0\3FAILURE\0"..., 4092) = 4092
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "\374\17\0\7", 4)         = 4
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "ORWARDING_DENIED\0\1TP_DISABLED\0\3F"..., 4092) = 4092
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "\374\17\0\7", 4)         = 4
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "elogs\0\377suexec\0\0certwatch\0\0aepdae"..., 4092) =  
> 4092
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "\374\17\0\7", 4)         = 4
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "slinux\0\0talk\0\1elnet\0\0ime\0\377wget\0\0"..., 4092)  
> = 4092
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "\374\17\0\7", 4)         = 4
> 4891  select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891  read(6, "changethehostname\0\0einit_wrapper"..., 4092) = 4092
> 4891  select(7, [6], [], NULL, {60, 0} <unfinished ...>
> 4871  <... select resumed> )            = 0 (Timeout)
> 4871  select(5, [4], [], NULL, {60, 0} <unfinished ...>
> 4891  <... select resumed> )            = 0 (Timeout)
> 4891  select(7, [6], [], NULL, {60, 0} <unfinished ...>
> 4871  <... select resumed> )            = 0 (Timeout)
> 4871  select(5, [4], [], NULL, {60, 0} <unfinished ...>
> 4891  <... select resumed> )            = 0 (Timeout)
> 4891  select(7, [6], [], NULL, {60, 0} <unfinished ...>
> 4871  <... select resumed> )            = 0 (Timeout)
> 
> .
> .
> same message another 100 times

Here it looks like a file transfer was in progress and the receiver
(4891) was waiting for more file data from the sender.  Again, I need to
see what the sender was doing.

Matt



More information about the rsync mailing list