rsync: race condition can cause loss of diagnostic output

Charles C. Fu =?BIG5?B?s8Wr2KTl?= ccwf at
Wed May 22 19:04:02 EST 2002

[This is a copy of the contents of Debian bug report #147842.]

Package: rsync
Version: 2.5.5-0.2
Severity: normal


- rsync forks a child which in turn forks a grandchild in
- Diagnostics written by the grandchild need to be read by the child
  using read_error_fd() to be handled properly (with the end result
  being that they are seen by the user running rsync).
- select() is used to call read_error_fd() only if there is stuff
  waiting to be read by it.
- The select()/read_error_fd() combo is checked when the child reads
  and writes.
- read_error_fd() only reads a single message from io_error_fd.

The end result is that if multiple messages have accumulated, waiting
and unread, on io_error_fd at the time the write_int() and io_flush()
are called at the end of do_recv(), then only the first is handled by
read_error_fd().  The rest get discarded.


Test Case

A manifestation of this race condition is that using the combo of
options '-v --delete-after' does not always report all files deleted.
Using strace shows that the grandchild correctly writes messages about
all deletions, but the child does not necessarily read all these
deletion messages.

Here is a simple test case.

  $ rm -rf test
  $ mkdir -p test/1/2/3 test/a/b/c
  $ rsync -a --delete-after --delete test/ /tmp/test
  $ rm -rf test/a
  $ rsync -av --delete-after --delete test/ /tmp/test
  building file list ... done
  deleting directory a/b/c
  deleting directory a/b
  wrote 99 bytes  read 20 bytes  238.00 bytes/sec
  total size is 0  speedup is 0.00

Above, the directory /tmp/test/a was also deleted but is not listed.

On my system, this is the most common result from running the test
case.  However, since it is a race condition, sometimes more and
sometimes fewer deletions manage to get read by the child and then
listed depending upon other load on the system, exact option flags
used (e.g., '-vv' instead of just '-v'), ....

Note: The test case relies upon --delete-after to show the bug.
Without --delete-after, the deletion messages are written before
the fork of the grandchild, so the bug does not manifest.



Looking at the code, I would suggest that main.c:do_recv() be modified
as follows:

- Have the grandchild close error_pipe[1] before its final sleep loop.

  (Why doesn't the grandchild just exit instead of sleeping?  Is it
  only to try to avoid having to deal with a SIGCHLD in the middle of
  an I/O operation in the child?)

- In the child, prior to the kill(), loop and call read_error_fd()
  until that descriptor is closed.

However, I have not written code to test these suggested modifications
nor am I sufficiently well-versed with the rsync code to know whether
or not my suggested modifications are the best fix.


I am not a member of the rsync mailing list, so please email me
directly if further details are needed.

-- System Information
Debian Release: 3.0
Kernel Version: Linux thanatar 2.4.17-686-smp #2 SMP Sat Dec 22 22:00:42 EST 2001 i686 unknown

Versions of the packages rsync depends on:
ii  libc6          2.2.5-4        GNU C Library: Shared libraries and Timezone
ii  libpopt0       1.6.2-7        lib for parsing cmdline parameters

Charles C. Fu                           ,--
Vice President         ___  __ __. . ,-/--      ccwf at
Bacchus, Inc.              (_,(_,|/|/ /
Tel/Fax 310-455-2396             ----'

More information about the rsync mailing list