rsync Failure
Matt McCutchen
matt at mattmccutchen.net
Sat Dec 6 05:07:10 GMT 2008
On Fri, 2008-12-05 at 11:42 -0500, Shawn Geraghty wrote:
> Upgrading will be done, but not really an option at this point, unless
> of course, that is the fix. I'm a limited resource and other stuff is
> backing up.
I understand, but if the strace does indicate that you're hitting a hang
bug in rsync 2.6.9, I don't know that anyone will care to investigate
the bug unless it can be reproduced with the latest stable rsync.
> Here is an strace from server A.
>
>
> 11242 stat64("/etc/localtime", {st_mode=S_IFREG|0644,
> st_size=3519, ...}) = 0
> 11242 write(3, "2008/12/05 08:04:55 [11242] recv"..., 92) = 92
> 11242 write(1, "recv_file_name(usr/share/icons/L"..., 64) = 64
> 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 11242 read(6, "9\0\0\t", 4) = 4
> 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 11242 read(6, "make_file(1,usr/share/icons/Loco"..., 57) = 57
> 11242 time(NULL) = 1228482295
> 11242 stat64("/etc/localtime", {st_mode=S_IFREG|0644,
> st_size=3519, ...}) = 0
> 11242 write(3, "2008/12/05 08:04:55 [11242] make"..., 85) = 85
> 11242 write(1, "make_file(1,usr/share/icons/Loco"..., 57) = 57
> 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 11242 read(6, "\f\0\0\7", 4) = 4
> 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 11242 read(6, "\276&\5p.pngt\1\0\0", 12) = 12
> 11242 time(NULL) = 1228482295
> 11242 stat64("/etc/localtime", {st_mode=S_IFREG|0644,
> st_size=3519, ...}) = 0
> 11242 write(3, "2008/12/05 08:04:55 [11242] recv"..., 88) = 88
> 11242 write(1, "recv_file_name(usr/share/icons/L"..., 60) = 60
> 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 11242 read(6, ":\0\0\t", 4) = 4
> 11242 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 11242 read(6, "make_file(1,usr/share/icons/Loco"..., 58) = 36
> 11242 select(7, [6], [], NULL, {60, 0}) = 0 (Timeout)
> 11242 select(7, [6], [], NULL, {60, 0}) = 0 (Timeout)
> 11242 select(7, [6], [], NULL, {60, 0}) = 0 (Timeout)
> [...]
So the receiver was waiting for the sender to send it the rest of the
(non-incremental) file list. That's what it's supposed to do, so I need
to know what the sender was doing to diagnose the hang.
> The server being backed up doesn't run rsync as a daemon. I've read
> in the rsync docs that backups can be done initiated from the server
> with out the client running rsync as a daemon. For this case, I don't
> think I can get an strace. But, if possible, let me know and I will
> track down how to do that.
You can, and you'll need to if we are to get to the root of the problem.
See the "rsync-debug" script described at:
http://rsync.samba.org/issues.html
> I also tried to back-up another server, and captured an strace from
> Server A. Here is that trace.
>
> The big difference here was that I had to ^C the process after about
> an hour. Usually, it looks like ssh shuts down the tunnel after a
> period of 15 minutes.
>
> 4872 <... select resumed> ) = 2 (in [4], out [6])
> 4891 select(7, [6], [], NULL, {60, 0} <unfinished ...>
> 4872 write(6, "\374\17\0\7glade2\0\f.xpm\0\347images\0\6/yelp\0"...,
> 16384 <unfinished ...>
> 4891 <... select resumed> ) = 1 (in [6], left {60, 0})
> 4872 <... write resumed> ) = 16384
> 4891 read(6, <unfinished ...>
> 4872 read(4, <unfinished ...>
> 4891 <... read resumed> "urrent.desktop\0\5win95.desktop\0\3756"...,
> 4092) = 4092
> 4872 <... read resumed> "\243\36c\274aVv\324\16\307z
> \347\17\274\247\34\331\327\333X\256\177\222\360\246\"\v\301\"pA%"...,
> 8192) = 8192
> 4891 select(7, [6], [], NULL, {60, 0} <unfinished ...>
> 4872 select(8, [4 5], [], NULL, NULL <unfinished ...>
> 4891 <... select resumed> ) = 1 (in [6], left {60, 0})
> 4872 <... select resumed> ) = 1 (in [4])
> 4891 read(6, <unfinished ...>
> 4872 read(4, <unfinished ...>
> 4891 <... read resumed> "\374\17\0\7", 4) = 4
> 4872 <... read resumed> "\270\335\204#\373QM\217U\225K
> \241\22\2750\357\256\2504\215\206L\255M\264\364\362\267\366D
> \335\350"..., 8192) = 3040
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "t-preferences.glade2\0\1asciitable"..., 4092) = 4092
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "\374\17\0\7", 4) = 4
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4872 select(8, [4 5], [6], NULL, NULL <unfinished ...>
> 4891 read(6, <unfinished ...>
> 4872 <... select resumed> ) = 1 (out [6])
> 4891 <... read resumed> "glade2\0\f.xpm\0\347images\0\6/yelp\0\5/
> em"..., 4092) = 4092
> 4872 write(6, "\374\17\0\7ORWARDING_DENIED\0\1TP_DISABLE"..., 16384
> <unfinished ...>
> 4891 select(7, [6], [], NULL, {60, 0} <unfinished ...>
> 4872 <... write resumed> ) = 16384
> 4891 <... select resumed> ) = 1 (in [6], left {60, 0})
> 4872 select(8, [4 5], [], NULL, NULL <unfinished ...>
> 4891 read(6, "\374\17\0\7", 4) = 4
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "t-config-nfs.png\0\371test.py\0\0mainW"..., 4092) = 4092
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "\374\17\0\7", 4) = 4
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "RWARDING_DENIED\0\1TP_DISABLED\0\3FA"..., 4092) = 4092
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "\374\17\0\7", 4) = 4
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "NG_DENIED\0\1TP_DISABLED\0\3FAILURE\0"..., 4092) = 4092
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "\374\17\0\7", 4) = 4
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "ORWARDING_DENIED\0\1TP_DISABLED\0\3F"..., 4092) = 4092
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "\374\17\0\7", 4) = 4
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "elogs\0\377suexec\0\0certwatch\0\0aepdae"..., 4092) =
> 4092
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "\374\17\0\7", 4) = 4
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "slinux\0\0talk\0\1elnet\0\0ime\0\377wget\0\0"..., 4092)
> = 4092
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "\374\17\0\7", 4) = 4
> 4891 select(7, [6], [], NULL, {60, 0}) = 1 (in [6], left {60, 0})
> 4891 read(6, "changethehostname\0\0einit_wrapper"..., 4092) = 4092
> 4891 select(7, [6], [], NULL, {60, 0} <unfinished ...>
> 4871 <... select resumed> ) = 0 (Timeout)
> 4871 select(5, [4], [], NULL, {60, 0} <unfinished ...>
> 4891 <... select resumed> ) = 0 (Timeout)
> 4891 select(7, [6], [], NULL, {60, 0} <unfinished ...>
> 4871 <... select resumed> ) = 0 (Timeout)
> 4871 select(5, [4], [], NULL, {60, 0} <unfinished ...>
> 4891 <... select resumed> ) = 0 (Timeout)
> 4891 select(7, [6], [], NULL, {60, 0} <unfinished ...>
> 4871 <... select resumed> ) = 0 (Timeout)
>
> .
> .
> same message another 100 times
Here it looks like a file transfer was in progress and the receiver
(4891) was waiting for more file data from the sender. Again, I need to
see what the sender was doing.
Matt
More information about the rsync
mailing list