[clug] Re: bizare network behavior

Alex Satrapa grail at goldweb.com.au
Sat Apr 5 03:38:51 EST 2003


On Friday, April 4, 2003, at 10:51 , Martin Pool wrote:

> Anyhow the solution is to get tcpdump and Stevens and rub them
> together.

It could also be a dodgey Ethernet connection (NIC, cable or port on the 
switch/hub).

Check whether you've got a card that's talking half-duplex to the 
switch, with the switch thinking it's talking full duplex to the card.  
Classic Ethernet switching problem.  If you've got a hub instead, it 
might be the other way around (hub half, NIC full). One symptom of the 
full/half duplex confusion will be that every time the TX/RX light on 
that switch port blinks, the collision light blinks too.

Try ping-flooding (as root, "ping -f other.machine") and see how many 
packets are being dropped.  If you're suffering full/half duplex 
confusion, you'll find that most of the ping packets are dropped (your 
screen fills up very quickly with dots).

Ethernet and TCP will exponentially back off when collisions/packet loss 
(respectively) occur.  So if you lose an Ethernet frame (due to 
collision or electrical failure), Ethernet will back off, stalling the 
TCP stream... if you're sending data fast enough, the TCP stack on one 
machine will think that some packets have been dropped (because they 
have, strangely enough, been dropped), and back off.  Continue this 
behaviour often enough and the whole connection will come to a screaming 
halt (well.. more like a squelching through molasses crawl than a total 
halt).

IIRC, in the case of duplex confusion, either the outgoing packets or 
the incoming ACKs will end up colliding with themselves at the switch - 
I'm not sure of the mechanics behind it, but I suspect that in FDX mode, 
the NIC isn't expecting the data to come back to it. Since the ACKs get 
squished somewhere, the respective TCP stack will back off to reduce the 
(nonexistent) congestion that is likely to have caused the dropped 
packets.

Watch the switch or hub, if the collision light is blinking on in beat 
with the TX/RX light, you've got duplex confusion happening (I'm sure 
there's a "proper" technical term for it, but I've never picked it up).

If you've got coax... my prayers are with you.

If you're not suffering electrical or physical layer problems, then 
protocol analysis (tcpdump, a nice chair and a comfy quote from Comer) 
is the way to go.  But duplex confusion and corroded/broken contacts on 
your RJ-45 are so much easier to solve than non-sliding TCP windows...

Alex
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 151 bytes
Desc: not available
Url : http://lists.samba.org/archive/linux/attachments/20030405/7d47298e/attachment.bin


More information about the linux mailing list