[distcc] (fwd from akpm@digeo.com) Fw: Re: fix TCP roundtrip time update code

Martin Pool mbp at samba.org
Wed Jun 4 04:02:23 GMT 2003


In case anyone else was having connections mysteriously hang on distcc
with Linux kernel 2.5: it may now be fixed.



----- Forwarded message from Andrew Morton <akpm at digeo.com> -----

From: Andrew Morton <akpm at digeo.com>
Subject: Fw: Re: fix TCP roundtrip time update code
Date: Tue, 3 Jun 2003 12:04:40 -0700
To: "tridge at samba.org" <tridge at samba.org>, Martin Pool <mbp at samba.org>
X-Mailer: Sylpheed version 0.9.0pre1 (GTK+ 1.2.10; i686-pc-linux-gnu)
X-Spam-Status: No, hits=0.0 required=4.0
	tests=none
	version=2.55


hallabloodylooya.  Looks like we finally nailed the distcc hang in 2.5.x


Begin forwarded message:

Date: Tue, 3 Jun 2003 11:45:24 -0700
From: David Mosberger <davidm at napali.hpl.hp.com>
To: Martin Josefsson <gandalf at wlug.westbo.se>
Cc: davidm at hpl.hp.com, kuznet at ms2.inr.ac.ru, linux-kernel at vger.kernel.org, linux-ia64 at linuxia64.org, netdev at oss.sgi.com
Subject: Re: fix TCP roundtrip time update code


>>>>> On 03 Jun 2003 19:41:11 +0200, Martin Josefsson <gandalf at wlug.westbo.se> said:

  Martin> (trimmed CC line and added netdev) On Tue, 2003-06-03 at
  Martin> 17:52, David Mosberger wrote:
  >> One of those very-hard-to-track-down, trivial-to-fix kind of
  >> problems: without this patch, TCP roundtrip time measurements
  >> will corrupt the routing cache's RTT estimates under heavy
  >> network load (the bug causes RTAX_RTT to go negative, but since
  >> its type is u32, you end up with a huge positive value...).  From
  >> there on, later TCP connections quickly will go south.

  >> The typo was introduced 8 months ago in v1.29 of the file by the
  >> patch entitled "Cleanup DST metrics and abstrct MSS/PMTU
  >> further".

  Martin> I tested this patch and it looks like it has cured my
  Martin> mysterious TCP stalls.

Yes, this sounds reasonable.  I wasn't very clear on this point, but
"by going south" I meant that TCP is starting to misbehave.  In
particular, you'll likely end up with the kernel aborting ESTABLISHED
TCP connections with extreme prejudice (and in violation of the TCP
protocol), because it thought that it had been unable to communicate
with the remote end for a _very_ long time.  The net effect typically
is that you end up with one end having a connection that's in the
ESTABLISHED state and the other end having no trace of that
connection.

	--david
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

----- End forwarded message -----
-- 
Martin 



More information about the distcc mailing list