No error messages in rsyncd log in 2.6.1pre-1

John Van Essen vanes002 at umn.edu
Tue Apr 27 10:15:25 GMT 2004


(As I was composing this, the 2.6.1 release notice on the rsync
 list rolled in.  The quoted source, below, hasn't changed, so
 I'll leave the 'pre-1' references unchanged...)

I have a situation where an error message seems to be sent from
the daemon to the client, but none is logged in the daemon's log.

Daemon is 2.6.1pre-1, with --timeout=3600, light CPU load.

Client is 2.5.5, with --timeout=3600, very heavy CPU load.

The client initiates a pull from the daemon.  I believe the file
list is received by the client pretty quickly.  The client machine
is very busy and it take well over 3600 seconds for it to work its
way down the list and finally get to a file that it needs to sync.
At that point, it errors out.  Here's the final output:


receiving file list ... done
games/empireearth2/
io timeout after 3600 seconds - exiting
rsync error: timeout in data send/receive (code 30) at io.c(85)
games/sacred/
rsync: connection unexpectedly closed (261664 bytes read so far)
rsync error: error in rsync protocol data stream (code 12) at io.c(150)


I've run into this situation prior to 2.6.1, and as I recall, the
'io timeout' message did not appear (and possibly the line after
it, also).  I recall just the 'connection unexpectedly closed'
message appearing, and the appropriate timeout error message would
be in the rsyncd log.

But now, beyond the initial 'rsync on [module] from [host]' message,
there are no further entries in the rsyncd log for this timeout case.
There should be an error message to indicate what happened, and when.

There was a change to rwrite() that seems to be preventing the error
logging.  This 2.6.1 code:

        if (am_server) {
                /* Pass it to non-server side, perhaps through our sibling. */
                if (msg_fd_out >= 0) {
                        send_msg((enum msgcode)code, buf, len);
                        return;
                }
                if (io_multiplex_write((enum msgcode)code, buf, len))
                        return;
        }

will send all errors to the client and none to the log when the
daemon is the server.  That code replaced this code in 2.6.0:

        /* first try to pass it off to our sibling */
        if (am_server && log_error_fd != -1) {
                err_list_add(code, buf, len);
                err_list_push();
                return;
        }

        /* next, if we are a server and multiplexing is enabled,
         * pass it to the other side.  */
        if (am_server && io_multiplex_write(code, buf, len)) {
                return;
        }

which had replaced even older code in 2.5.6:

        /* first try to pass it off to our sibling */
        if (am_server && log_error_fd != -1) {
                err_list_add(code, buf, len);
                err_list_push();
                return;
        }

        /* next, if we are a server but not in daemon mode, and multiplexing
         *  is enabled, pass it to the other side.  */
        if (am_server && !am_daemon && io_multiplex_write(code, buf, len)) {
                return;
        }

so it looks like there's a missing !am_daemon in the newer code,
and this problem probably existed in 2.6.0, but I hadn't run into
this timeout problem in the past few months to notice it.

In any case, shouldn't errors (FERROR) *always* be logged via logit()
if running as a daemon even if they are also sent to the sender?
I'd sure like to see that.  So there's perhaps more to be adjusted
here than just adding back a "!am_daemon" test?

Also - I'd sure like to see a message in the rsyncd log when the
file list transfer has been completed.  That would help determine
at what stage things are failing (and how long it took for the
file transfer to complete, in case that helps).
-- 
        John Van Essen  Univ of MN Alumnus  <vanes002 at umn.edu>



More information about the rsync mailing list