Truncated output from "rsync -e ssh ... 2>&1 | tee"

David Evers dme49 at hotmail.com
Tue Sep 28 16:10:13 GMT 2004


(Versions: OpenSSH_3.7.1p2, rsync version 2.6.2)

I've just encountered a situation where "rsync -v -n" appears to run 
normally,
but reports many fewer file transfers than actually get done when you remove
the -n. (This is not one of the usual "-n" corner cases.)

It turns out that this only happens when you're doing a remote
rsync over ssh AND you redirect stderr into a pipe that fills up, as in

     rsync -e ssh -avn host:/path /local/path 2>&1 | tee LOG

I can get the right answer by just not capturing stderr;
i.e. removing the "2>&1" and just saying

     rsync -avn host:/path /local/path | tee LOG

works.

The data loss occurs when the pipe (to tee here) fills, so in principle
you could lose output even without the "-n", it's just less
likely when the output is generated slower.

After poking around with strace, it seems that rsync's child ssh sets its
stdERR non-blocking, and that stderr has been inherited unchanged
from the top-level rsync.  (The rsync has supplied pipes for its child's
stdin and stdout, but left the stderr alone;
see rsync-2.6.2/pipe.c::piped_child().)

Because of the "2>&1", the top-level stderr is a dup of
the top-level stdout, so ssh has inadvertantly made rsync's
stdOUT non-blocking.  Rsync is not expecting that, and does
not check the return code from fflush(stdout), so it can
silently drop lines from stdout.
(See the end of rsync-2.6.2/log.c::rwrite().)

CVS has basically the same problem, as discussed at
http://groups.google.com/groups?th=e4df2fdc1f4f4950,
which mentions some workarounds that the CVS people
considered.

It's not clear whether the problem should really be fixed in rsync,
ssh, or glibc, but in the meantime, would it be worth adding a
warning to the docs/FAQ/known-issues/wherever?




More information about the rsync mailing list