rsync 3.0.7 network errors on MS-Windows
andrew.marlow at uk.bnpparibas.com
andrew.marlow at uk.bnpparibas.com
Wed Jun 2 07:42:49 MDT 2010
I apologise for top posting. The email client I am forced to use forces
me.
What Jamie says below seems to imply to me that the rsync protocol needs
to be extended to include clean shutdown. AFAIK WinSock still behaves the
same way it always did so it will not be able to do the same kind of
cleanup that you can in Unix. But Jamie seems to be saying that on unix a
shutdown would cause a RST signal anyway. So I am really not sure where
this leaves me.
Surely other people have been seeing this problem when doing an rsync to
hundreds of machines at the same time. Out of around 200 machines I got
these errors from over a dozen today. That's more than usual but it's all
random when it comes down to it. I wonder who else is rsyncing on this
scale.
Regards,
Andrew Marlow
Internet
jamie at shareable.org
02/06/2010 13:55
To
Andrew MARLOW
cc
rsync at lists.samba.org
Subject
Re: rsync 3.0.7 network errors on MS-Windows
andrew.marlow at uk.bnpparibas.com wrote:
>
> I am experiencing intermittent network failures on rsync 3.0.7 built
> using cygwin for Windows-XP (SP2). I am using GCC v4.4.2 and the
> latext version of cygwin.
> The rsync error long indicates things like:
> rsync: writefd_unbuffered failed to write 4092 bytes to socket
> [generator]:
> Connection reset by peer (104)rsync: read error: Connection reset by
> peer (104)
> rsync error: error in rsync protocol data stream (code 12) at
> io.c(1530) [generator=3.0.7]
> rsync error: error in rsync protocol data stream (code 12) at
> io.c(760) [receiver=3.0.7]
> Googling I see that these problems were put down to the way socket
are
> cleaned up in Windows and a fix put in place in cleanup.c, in
> close_all(). But the fix is surrounded by conditional compilation:-
> #ifdef SHUTDOWN_ALL_SOCKETS
> :
> :
> #endif
> Can someone please explain why that is? Shouldn't the fix just be
> there always, and regardless of which operating system?
It's not needed on most operating systems - as the comment there implies.
According to the notes copied below, SO_LINGER is off by default on
unix sockets, and this means close() will gracefully send the
remaining data in the background, rather than TCP RST. You can assume
that program exit has the same effect as close().
If SO_LINGER is turned on with a zero timeout, the notes below say TCP
RST is sent on close, which is much like what the comment for
SHUTDOWN_ALL_SOCKETS says is happening on Windows without SO_LINGER.
Presumably Windows sockets - or at least some version of them (there
are several versions of Winsock) - behaves differently from unix
sockets in this area. It wouldn't be surprising, as historically
Winsock ran inside the process not the kernel, so an exiting process
couldn't implement the unix graceful close behaviour, and maybe they
kept that behaviour the same in later versions.
That said, I still don't see why SHUTDOWN_ALL_SOCKETS would fix it.
Calling shutdown(fd,2) closes it in both directions, and at least with
usual unix sockets, that would trigger TCP RST anyway if the other end
sends any data after the shutdown.
Which it seems to be doing: "writefd_unbuffered failed to write 4092
bytes to socket" implies the other end has closed or shutdown(fd,1) or
shutdown(fd,2), and then data is sent to it which can't be accepted so
the other end sent back TCP RST anyway.
If rsync is doing that in normal operation, that ought to be a problem
on unix just as much as Windows - and SHUTDOWN_ALL_SOCKETS ought to be
insufficient to prevent the reset.
Which suggests to me that "writefd_unbuffered failed to write 4092
bytes to socket" is a symptom of a different problem.
Here are the notes I referred to above.
These are the notes which explain SO_LINGER's behaviour:
Unix Socket FAQ
http://www.developerweb.net/forum/archive/index.php/t-2982.html
4.6 - What exactly does SO_LINGER do?
Contributed by Cyrus Patel
SO_LINGER affects the behaviour of the close() operation as described
below. No socket operation other than close() is affected by SO_LINGER.
The following description of the effect of SO_LINGER has been culled
from the setsockopt() and close() man pages for several systems, but
may
still not be applicable to your system. The range of differences in
implementation ranges from not supporting SO_LINGER at all; or only
supporting it partially; or having to deal with the "peculiarities" in
a
particular implementation. (see portability notes at end).
Moreover, the purpose of SO_LINGER is very, very specific and only a
tiny minority of socket applications actually need it. Unless you are
extremely familiar with the intricacies of TCP and the BSD socket API,
you could very easily end up using SO_LINGER in a way for which it was
not designed.
The effect of an setsockopt(..., SO_LINGER,...) depends on what the
values in the linger structure (the third parameter passed to
setsockopt()) are:
Case 1: linger->l_onoff is zero (linger->l_linger has no meaning):
This is the default.
On close(), the underlying stack attempts to gracefully shutdown the
connection after ensuring all unsent data is sent. In the case of
connection-oriented protocols such as TCP, the stack also ensures that
sent data is acknowledged by the peer. The stack will perform the
above-mentioned graceful shutdown in the background (after the call to
close() returns), regardless of whether the socket is blocking or
non-blocking.
Case 2: linger->l_onoff is non-zero and linger->l_linger is zero:
A close() returns immediately. The underlying stack discards any unsent
data, and, in the case of connection-oriented protocols such as TCP,
sends a RST (reset) to the peer (this is termed a hard or abortive
close). All subsequent attempts by the peer's application to
read()/recv() data will result in an ECONNRESET.
Case 3: linger->l_onoff is non-zero and linger->l_linger is non-zero:
A close() will either block (if a blocking socket) or fail with
EWOULDBLOCK (if non-blocking) until a graceful shutdown completes or
the
time specified in linger->l_linger elapses (time-out). Upon time-out
the
stack behaves as in case 2 above.
---------------------------------------------------------------
Portability note 1: Some implementations of the BSD socket API do not
implement SO_LINGER at all. On such systems, applying SO_LINGER either
fails with EINVAL or is (silently) ignored. Having SO_LINGER defined in
the headers is no guarantee that SO_LINGER is actually implemented.
Portability note 2: Since the BSD documentation on SO_LINGER is sparse
and inadequate, it is not surprising to find the various
implementations
interpreting the effect of SO_LINGER differently. For instance, the
effect of SO_LINGER on non-blocking sockets is not mentioned at all in
BSD documentation, and is consequently treated differently on different
platforms. Taking case 3 for example: Some implementations behave as
described above. With others, a non-blocking socket close() succeed
immediately leaving the rest to a background process. Others ignore
non-blocking'ness and behave as if the socket were blocking. Yet others
behave as if SO_LINGER wasn't in effect [as if the case 1, the default,
was in effect], or ignore linger->l_linger [case 3 is treated as case
2]. Given the lack of adequate documentation, such differences are not
(by themselves) indicative of an "incomplete" or "broken"
implementation. They are simply different, not incorrect.
Portability note 3: Some implementations of the BSD socket API do not
implement SO_LINGER completely. On such systems, the value of
linger->l_linger is ignored (always treated as if it were zero).
Technical/Developer note: SO_LINGER does (should) not affect a stack's
implementation of TIME_WAIT. In any event, SO_LINGER is not the way to
get around TIME_WAIT. If an application expects to open and close many
TCP sockets in quick succession, it should be written to use only a
fixed number and/or range of ports, and apply SO_REUSEPORT to sockets
that use those ports.
Related note: Many BSD sockets implementations also support a
SO_DONTLINGER socket option. This socket option has the exact opposite
meaning of SO_LINGER, and the two are treated (after inverting the
value
of linger->l_onoff) as equivalent. In other words, SO_LINGER with a
zero
linger->l_onoff is the same as SO_DONTLINGER with a non-zero
linger->l_onoff, and vice versa.
-- Jamie
___________________________________________________________
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is prohibited.
Please refer to http://www.bnpparibas.co.uk/en/information/legal_information.asp?Code=ECAS-845C5H for additional disclosures.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20100602/6f9ec612/attachment.html>
More information about the rsync
mailing list