Quick analysis of 'Broken pipe' error at io.c(463)
smithja at cs.unc.edu
Thu Jul 11 18:37:05 EST 2002
This one has been biting me recently, and it seems to be completely
deterministic: it utterly fails on most directories, but when it fails
on one, it does so reproducibly. Also, I'm doing a simple copy from one
directory on my machine to another - there is no remote server or
network to fail. The drive is only 24% full.
Here's my take on it:
1) The error is printed as occurring in io.c: writefd_unbuffered, when a
SIGPIPE is being thrown by the system from (I'm guessing) a lack of pipe
readers. main.c (907) sets this up to be ignored, claiming "we'll see
the EPIPE"... but EPIPE is never explicitly checked for in io.c, just
assumed as the last case. Perhaps this should be handled more cleanly?
There may be a timing issue here that could be handled with a sleep().
2) The core file being thrown by my setup is for the parent rsync only -
during flist.c: send_file_name, which would be for the writer task...
but it's the reader task forked off that's apparently going into lala
land and tanking the enterprise by triggering a SIGPIPE/EPIPE on write.
So, advice to all seeing this error: have gdb handy to bind to the
*second* rsync process, as it is the reader process that seems to be
failing, and throwing a monkey in the works.
Someone correct me if I'm wrong, but this seems like the best
approach. Also, if anyone has a good slick way of attaching gdb to the
second rsync process, I'd love to hear it. Right now it's a mad
scramble with ps -aux | grep rsync and gdb.
A few details:
rsync --archive --update --relative --exclude-from=exlist is the
option string. It fails on any set of options, however, including none,
-vvv causes it to *hang*, not die. Waiting for information from
the hung reader process?
I set up a series of directories to be synced, including the 'bad'
one. The point at which is choked differed depending on where in the
list the 'bad' directory was placed. The 'bad' one isn't extremely
huge, certainly smaller than many that are successfully syncing. This
may point to a memory allocation problem?
Hope this helps.
More information about the rsync