[linux-cifs-client] kernel setsocketoptions API - can not set options to avoid very slow TCP ACK

Steve French smfltc at us.ibm.com
Fri Jun 24 17:49:24 GMT 2005


I was looking at some network traces today that indicated the need to
set TCP_NODELAY, and possibly turn on/off TCP_QUICKACK in a few places -
but I don't see a reasonable way to call set socketoptions for TCP
sockets from kernel - they seem to require a user space pointer.

The details, sequential network file copy from server:
	mounted over loopback interface using most current cifs 
	to most current samba 3

	dd if=/mnt/bigfile of=/dev/null BS=1M

	The file read requests are for 16K (the negotiated buffer size)
	over the network to Samba

	The read responses are being sent quickly enough by
	samba (less than 1ms).  The responses are a bit over 16K due to 
	protocol headers - perhaps 16K+100.

	TCP/IP fragments the response into two pieces.  One almost
	16K and and the remaining 100 bytes or so in a second frame.

	The second small frame is held up for a few hundred 
	milliseconds (!) waiting on the client TCP stack to 
	do a TCP ACK.

	This results in terrible network utilization as the second 
	read request will not be issued until the first response is 
	completely received.  This particular case is probably about
	50 times slower than it should be due to waiting on the slow
	ACK.

There are various other things that would help of course - running other
processes on the client accessing the network filesystem would cause the
ack to sometimes happen faster, and async dispatching of more read
requests at one time, or of even larger read requests, but this case
does indicate that turning on quickack on the client for the case of
waiting for large read responses, and probably readdir (which responses
are also similar in size) is worth trying.   In addition, smbfs (in user
space) sets TCP_NODELAY for its sockets when it mounts and that is
probably appropriate here as well for cifs client.  Is there a
recommended way to set this in kernel - NFS/SunRPC seems to set its
socket options directly in sock->sk and that is fine with me but it
would look a little strange to do
	tp->nonagle |= TCP_NAGLE_OFF | TCP_NAGLE_PUSH 
and
	ip_push_pending _frames
in filesystem code (since calling net/ipv4/tcp.c setsocketoption is not
possible).



More information about the linux-cifs-client mailing list