The SND/RCV LO/HI WAT options

Thu Jun 10 13:47:00 GMT 1999

Majid Tajamolian wrote:

> > > I'm working on SAMBA performance enhancement. Can anyone guide me for a
> > > reference about the 4 WATer mark socket options and their effects on
> > > TCP/IP performance?
> > > Have anyone any experiments about using them in the SAMBA code?

I asked:
> >       A colleague has been experimenting with them in a non-samba
> >       context, and we've been looking at their behavior quantiatively:
> >       Alas, I'm not doing the experiments myself...
> >
> >       What would you like to know?
> >
> It seems that anything about them can be useful (i.e. their meaning, how
> they change the TCP behavior, what is relation between them and the other
> configurations such as SNDBUF, RCVBUF, , ...).
> If you don't know about them, can you help me to find some information in
> the internet?

	The best reference is Stevens' TCP/IP Illustrated,
	and a colleague and I have been playing with the
	high- and low-water marks in a group of Solaris
	servers and clients on 100baseT ethernet.

	Let's look at in /net3 terms (BSD & Linux):
	On the send side, an application writes a big chunk
	of data, say 10 KB.  The so_send processing hands
	this off in 2048-byte mbufs to the lower level of
	the protocol, until there is enough data sent but
	not acknowledged to exceed the high-water mark, which
	is at 8k.  At this point the sending stops until
	some acks come in.  The sending process is suspended.

	The process is not awakened until the send-but-unacked
	data falls below the low-water mark, 2 KB. At this point
	it send the next two KB and is done for the moment.

	In a large-data scenario (which is what I have at work),
	you want the high-water mark to be set high enough that
	     1) the expected write size is below the
		high-water mark (minimum), and
	     2) the mark is high enough that enough data will
		be transferred and ack's on back-to-back writes
		that there will be at least a write's worth of
		space left below the high-water mark (optimum)

	Lets assume Samba is being used to access big database relations,
	and max xmit is at it's default setting of 65,535.

	Logically, you'd
	     1) want the high-water-mark set above 64 KB
	That's easy: the /net3 maximum is 26,2144. (The Solaris
	maximum is different, and affects the TCP scaling options)

	     2) want there to be at least 64KB free during back-to-back
	     	transfers.
	This is harder: you'll have to measure to see how fast samba
	can issue transfers back-to-back (by adding some timers), then
	compute how much data your ethernet transfers in the same period
	and then set the high-water mark to suit.  Richards talks about
	the bandwidth-delay product: this is another issue affected by it.

	If you don't want to go the analytic route, you can just do
	experiments and plot high-water mark -vs- throughput.

	A test with no load on the ethernet will give you a curve that
	**underestimates** the buffering needed. Do your benchmarking in
	a production environment, in a busy period!

	If you can't I'd use a packet sniffer (snoop, in my case) to see 
	what your "normal" load is, then run a bunch of ftp jobs on another 
	pair of machines to simulate it on your test net.

	<simulated guru hat on>
	I'd expect the curve to look like this
           _____
       ___/      ---__
     _/ 
   /
  /
  |
  |
+------+-------+----
0      default lots

	In other words, the performance when it's small would be
	badly throttled.  When it's "enough" performance will
	jump up quickly, rise slowly to a peak and then drop off
	when big buffers starve other parts of the system
	for memory.
	<guru hat off>

	The gentle curve at the top of the graph is caused by 
	probabilistic effects: every once in a while, there's a burst
	of traffic, Samba doesn't get enough transferred in time and
	the the process hits the high-water mark and is suspended until
	the data drains. Add a bit more buffer and the probability
	is reduced.

	In my opinion, the important thing to know is where the performance
	jumps upward, measured on 10 mbit/sec ethernet for low, "average"
	and near-saturation loads.  THAT curve is interesting to someone
	doing performance tuning, as it will tell them
		what's the minimum they must have
		what they need in the worst case, and
		the shape of the curve between those points.
	A concave curve means you can be low without much risk: a convex
	one means you'd better get as close to the maximum as you can.

	One known good point is the default: it's known to be sane for
	10 mbit/sec ethernets and ftp sending 24-odd-KB worth of buffers.
	For 100, we don't need quite as much as the net acks data faster.
	For T1 and below, I have to ask a non-simulated guru!

--dave
--
David Collier-Brown,  | Always do right. This will gratify some people
185 Ellerslie Ave.,   | and astonish the rest.        -- Mark Twain
Willowdale, Ontario   | http://java.science.yorku.ca/~davecb
Work: (905) 477-0437 Home: (416) 223-8968 Email: davecb at canada.sun.com