[Bug 14315] New: rsync hangs when many errors
samba-bugs at samba.org
samba-bugs at samba.org
Thu Mar 5 22:31:33 UTC 2020
Bug ID: 14315
Summary: rsync hangs when many errors
Assignee: wayne at opencoder.net
Reporter: mvitale at sinenomine.net
QA Contact: rsync-qa at samba.org
Created attachment 15843
test program to aid in reproducing the issue
When performing a local rsync of a large directory (over 10000 files), it will
hang if a large number of errors occur on the target (destination) directory.
I am a support engineer for OpenAFS (openafs.org), and this issue was
originally reported by a customer as a possible OpenAFS problem. This customer
observed a hang when rsyncing a large directory into AFS. I was able to
reproduce the problem and demonstrate that the hang is triggered when chown
commands, issued by rsync to restore the group of the destination files, failed
due to a security feature of AFS that prohibits the owner of a file from
changing group ownership. The large number of resultant errors caused the
three rsync processes to stall.
With the help of a colleague, we were able to devise a way to reproduce this
hang without requiring an AFS filesystem. In order to recreate the rsync hang,
we need a way to get a large number of errors while performing the rsync from a
normal ext4 filesystem. In our procedure, we simulate these errors by using a
small Linux seccomp program to prohibit chgrp/chown syscalls.
1. Login to a linux account that belongs to at least 2 groups.
uid=1000(mvitale) gid=1000(mvitale) groups=1000(mvitale),10(wheel)
2. Build a program to simulate chown/chgrp errors:
$ sudo yum install libseccomp libseccomp-devel
$ cc -lseccmp seccomp-chown.c -o sec-kill-chown
The source code for seccomp-chown.c is attached to this ticket.
3. Create a large source directory with over 10000 files.
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
These files will all have the group ownership of the user's current group.
Any sufficiently large directory should work; it doesn't have to be a git repo.
4. Switch to the alternate group (starts a new shell)
$ newgrp wheel
uid=1000(mvitale) gid=10(wheel) groups=10(wheel),1000(mvitale)
5. Enable the error generator (this also starts a new shell)
Running shell. chown() and friends are now unavailable.
6. Create a target directory and run rsync to duplicate the hang.
$ mkdir target
$ cd target
$ rsync -av --delete --log-file=/tmp/rlog.$$ /home/mvitale/linux ./
This should hang after a few seconds.
7. Exit the two shells (seccomp and newgrp)
I was able to perform a git bisect to isolate the commit that introduced this
d8587b4 Change the msg pipe to use a real multiplexed IO mode for the data that
goes from the receiver to the generator.
The following releases show the problem: master, 3.1.3, 3.1.2, 3.1.0
Release 3.0.9 and older do not exhibit the problem.
Each of the following workarounds were successful for my customer and in my
- use an older version of rsync (3.0.9 or older)
- specify rsync option --msgs2stderr
- perform the rsync under a userid with the same group as the source files
Thanks for your consideration, and please let me know if there's anything else
I can provide to help.
mvitale at sinenomine.net
You are receiving this mail because:
You are the QA Contact for the bug.
More information about the rsync