write_unbuffered -- Connection reset by peer

Tue Nov 15 13:58:54 GMT 2005

Dear all,

This is just a placeholder really - I've spent quite a lot of time
trying to track down the cause of a problem I'm seeing here, and 
have eventually got an answer. Thought it might be useful to have 
the solution googleable.

Obviously, if someone has better ideas, I'd welcome comments. Similarly,
if anyone wants detailed logs, tcpdump or strace output, I'd be happy 
to oblige. I'm pretty sure it's not rsync that's to blame though.

The problem:

I have a bunch of Linux servers, one of which is going to act as a backup
repository for the others. All are running Gentoo Linux with 2.6.x kernels.
The server is running 2.6.12-gentoo-r6 and my test client is running
2.6.11-gentoo-r9. I'm not really in a position to reboot any of these
machines to test other kernel versions.

The backup server is an old 450MHz Pentium with 100MBit Intel EEPro card
while the clients are all much higher spec (Xeons, typically, with Intel
E1000 Gigabit Ethernet).

I'm running rsync version 2.6.0 at both ends (though have tested 2.6.6
with the same results).

I'm using rsync in daemon mode at the backup server (no ssh involved) and
rsync tends to die with an error of the form:

rsync: writefd_unbuffered failed to write 4096 bytes: phase "unknown": 
Connection reset by peer 
rsync error: error in rsync protocol data stream (code 12) at io.c(666) 

Both server and client logs report "Connection reset by peer". 

The backup always seems to break when handling a large directory (in
terms of number of files - one particular directory with 26,000 smallish
files seems reliably to trigger the problem). 

If I do the rsync to a local disk, the error doesn't occur.

When I look at tcpdump's output, I see the window size dropping
to zero, indicating that the backup server is receiving data faster than it
can handle it. Presumably the sending machine should then back off, but
what actually appears to happen is that the connection gets dropped,
hence the rsync errors at both ends. I've tried messing with --bwlimit, 
but the problem even occurs when I drop it down to 80K/second, which is
patently ridiculous (the backup server can handle 100x that amount of
network I/O).

All my investigations seem to point to a TCP stack problem, but that's
about as far as I can get. I've found a solution that works for me - turn
off TCP window scaling on the backup server. This can be done using 

sysctl -w net.ipv4.tcp_window_scaling=0

There is some discussion of TCP window scaling problems at:

http://lwn.net/Articles/92727/

but these discuss broken routers. My server and client are on the
same subnet, so it's not exactly the same issue.

Turning off window scalling is probably not a good solution in general, but
it's fine for me because the backup server doesn't have any other
high-throughput requirement. Even with window scaling disabled, I'm still
getting 7MBytes/second from rsync, which is pretty close to saturation of
the server's card.

Cheers,
Alun.

-- 
Alun Jones                       auj at aber.ac.uk
Systems Support,                 (01970) 62 2494
Information Services,
University of Wales, Aberystwyth