Rsync hung when applying batch file

Kolya n.krat at carbonsoft.ru
Wed Jun 29 10:22:23 UTC 2016


Hello.

On rsync, version 3.0.9 got this trouble, rsync hung when applying batch 
file:

Sync all files from batch to asr_billing
install:   - skelet/cfg/
install:   - skelet/etc/
install: deleting usr/lib/python2.6/site-packages/IPython/external/Itpl.pyc - deleting
install: 100% 9.83MB/s - 0
install: 100% 51.39kB/s - usr/lib/python2.6/site-packages/dbfpy/utils.pyc
install: 100% 86.10kB/s - usr/lib/python2.6/site-packages/django/contrib/admin/widgets
install: 100% 25.41kB/s - usr/lib/python2.6/site-packages/django/contrib/auth/views.py
install: 100% 5.99kB/s - usr/lib/python2.6/site-packages/django/contrib/contenttypes/
install: 100% 2.66kB/s - usr/lib/python2.6/site-packages/django/contrib/formtools/wiz
install: 100% 4.77kB/s - usr/lib/python2.6/site-packages/django/contrib/sessions/back
install: 100% 16.92kB/s - usr/lib/python2.6/site-packages/django/core/management/valid
install: 100% 3.54kB/s - usr/lib/python2.6/site-packages/johnny/backends/memcached.py
install: 100% 45.51kB/s - usr/lib/python2.6/site-packages/openpyxl/comments/comments.p
install:   - usr/lib/python2.6/site-packages/phonenumbers/data/
install: 100% 73.82kB/s - usr/lib/python2.6/site-packages/pytz/tzinfo.pyc
install: 100% 4.21kB/s - usr/lib/python2.6/site-packages/requests/packages/__init__.p
install: 100% 9.61kB/s - usr/lib/python2.6/site-packages/suds/umx/typed.pyc
install: 100% 4.13kB/s - usr/lib/python2.6/site-packages/unittest2/util.pyc
WARNING: usr/lib64/python2.6/locale.pyc failed verification -- update discarded (may try again).
WARNING: usr/lib64/python2.6/optparse.pyc failed verification -- update discarded (may try again).
WARNING: usr/lib64/python2.6/urllib2.pyc failed verification -- update discarded (may try again).
install: 100% 23.09kB/s - usr/lib64/python2.6/curses/wrapper.pyc
install: 100% 16.61kB/s - usr/lib64/python2.6/json/scanner.pyc
install: 100% 42.08kB/s - usr/lib64/python2.6/multiprocessing/util.pyc

and rsync freeze here, forewer (wait 2-3 hours).

This two rsync process (from pstree) with arguments:
   │   │               └─rsync,2142 --filter=._- -z --compress-level=9 --block-size=40507 -c --port=873 --progress --delete -a --read-batch=/mnt/var/cache/carbon_update//asr_billing/devel/update.batch /mnt/apptmp/asr_billing/
   │   │                   └─rsync,2145 --filter=._- -z --compress-level=9 --block-size=40507 -c --port=873 --progress --delete -a --read-batch=/mnt/var/cache/carbon_update//asr_billing/devel/update.batch /mnt/apptmp/asr_billing/


File descriptions from proc:
# ll /proc/2145/fd
итого 0
lr-x------ 1 root root 64 Июн 29 03:00 0 -> /mnt/var/cache/carbon_update/asr_billing/devel/update.batch
l-wx------ 1 root root 64 Июн 29 03:00 1 -> /tmp/31178_asr_billing.tmp
lrwx------ 1 root root 64 Июн 29 03:00 2 -> /dev/pts/0
lrwx------ 1 root root 64 Июн 29 03:00 3 -> socket:[612268397]
lrwx------ 1 root root 64 Июн 29 03:00 6 -> socket:[612268400]
# ll /proc/2142/fd
итого 0
l-wx------ 1 root root 64 Июн 29 03:00 1 -> /tmp/31178_asr_billing.tmp
lrwx------ 1 root root 64 Июн 29 03:00 2 -> /dev/pts/0
lrwx------ 1 root root 64 Июн 29 03:00 3 -> socket:[612268397]
lrwx------ 1 root root 64 Июн 29 03:00 4 -> socket:[612268398]
lrwx------ 1 root root 64 Июн 29 03:00 5 -> socket:[612268399]



Strace:
# strace -Ff -p 2142 -p 2145
Process 2142 attached - interrupt to quit
Process 2145 attached - interrupt to quit
[pid  2145] select(4, [3], [], NULL, {14, 509436} <unfinished ...>
[pid  2142] select(6, [5], [], NULL, {45, 724560}


  <unfinished ...>
[pid  2145] <... select resumed> )      = 0 (Timeout)
[pid  2145] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  2142] <... select resumed> )      = 0 (Timeout)
[pid  2142] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  2145] <... select resumed> )      = 0 (Timeout)
[pid  2145] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  2142] <... select resumed> )      = 0 (Timeout)
[pid  2142] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  2145] <... select resumed> )      = 0 (Timeout)
[pid  2145] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  2142] <... select resumed> )      = 0 (Timeout)
[pid  2142] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  2145] <... select resumed> )      = 0 (Timeout)
[pid  2145] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  2142] <... select resumed> )      = 0 (Timeout)
[pid  2142] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  2145] <... select resumed> )      = 0 (Timeout)
[pid  2145] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  2142] <... select resumed> )      = 0 (Timeout)
[pid  2142] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  2145] <... select resumed> )      = 0 (Timeout)
[pid  2145] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  2142] <... select resumed> )      = 0 (Timeout)
[pid  2142] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  2145] <... select resumed> )      = 0 (Timeout)
[pid  2145] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  2142] <... select resumed> )      = 0 (Timeout)
[pid  2142] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  2145] <... select resumed> )      = 0 (Timeout)
[pid  2145] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  2142] <... select resumed> )      = 0 (Timeout)
[pid  2142] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  2145] <... select resumed> )      = 0 (Timeout)
[pid  2145] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  2142] <... select resumed> )      = 0 (Timeout)
[pid  2142] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  2145] <... select resumed> )      = 0 (Timeout)
[pid  2145] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  2142] <... select resumed> )      = 0 (Timeout)
[pid  2142] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  2145] <... select resumed> )      = 0 (Timeout)
[pid  2145] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  2142] <... select resumed> )      = 0 (Timeout)
[pid  2142] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  2145] <... select resumed> )      = 0 (Timeout)
[pid  2145] select(4, [3], [], NULL, {60, 0} <unfinished ...>



Stack from gdb:
(gdb) backtrace
#0  0x00007f14cf42c243 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1  0x00000000004277a3 in read_timeout (fd=5, buf=0xcbbbd0 "\004", len=8184) at io.c:662
#2  0x000000000042822e in readfd_unbuffered (fd=5, buf=0x7fffac015840 "\201i", len=4) at io.c:1020
#3  0x00000000004290c3 in readfd (fd=5, buffer=0x7fffac015840 "\201i", N=4) at io.c:1163
#4  0x00000000004294b7 in read_msg_fd () at io.c:346
#5  0x000000000041250f in generate_files (f_out=4, local_name=<value optimized out>) at generator.c:2366
#6  0x000000000041d24c in do_recv (f_in=<value optimized out>, f_out=4, local_name=0x0) at main.c:843
#7  0x000000000041d4a2 in client_run (f_in=0, f_out=4, pid=-1, argc=<value optimized out>, argv=0xcba240) at main.c:1080
#8  0x000000000041e2fd in start_client (argc=1, argv=0xcba240) at main.c:1287
#9  main (argc=1, argv=0xcba240) at main.c:1514

(gdb) backtrace
#0  0x00007f14cf42c243 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1  0x00000000004277a3 in read_timeout (fd=3, buf=0x7fffac012f00 "\300O\001\254\377\177", len=4) at io.c:662
#2  0x0000000000427cd4 in readfd_unbuffered (fd=3, buf=0x7fffac012f00 "\300O\001\254\377\177", len=4) at io.c:1017
#3  0x00000000004290c3 in readfd (fd=3, buffer=0x7fffac012f00 "\300O\001\254\377\177", N=4) at io.c:1163
#4  0x0000000000429a71 in read_int (f=<value optimized out>) at io.c:1191
#5  0x0000000000412d96 in gen_wants_ndx (desired_ndx=27012) at receiver.c:422
#6  0x00000000004145fd in recv_files (f_in=0, local_name=0x0) at receiver.c:481
#7  0x000000000041d150 in do_recv (f_in=0, f_out=4, local_name=0x0) at main.c:787
#8  0x000000000041d4a2 in client_run (f_in=0, f_out=4, pid=-1, argc=<value optimized out>, argv=0xcba240) at main.c:1080
#9  0x000000000041e2fd in start_client (argc=1, argv=0xcba240) at main.c:1287
#10 main (argc=1, argv=0xcba240) at main.c:1514


So, two process try get messages from each other, forewer.
I try trace process 2145 (other process in generate_files can setignore_timeout =1; insidegenerate_files(), i thoght, but i faul deattach from 2145 and process closed)

(gdb)
check_timeout () at io.c:124
124	in io.c
(gdb) print io_timeout
$11 = 0
(gdb) print ignore_timeout
$2 = 0


So, timeout newer happen.

My solution is use --timeout parameter (now, always) in rsync. But i think, this is bug, when rsync hung on applying batch file.

I try, but unable to find other solution to fix this, soi will be glad if you pay attention to it.

Thanks.

-- 
Спасибо, что используете продукты компании Carbon Soft.

Успехов и процветания!
Крать Николай.
Отдел разработки компании «Carbon Soft».
--------------------------------------------
Тел.: +7 (495) 668-12-00, +7 (495) 668-07-88
E-mail:n.krat at carbonsoft.ru
ICQ: 672359629
http://www.carbonsoft.ru

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20160629/88d7ce4c/attachment.html>


More information about the rsync mailing list