frustration Re: FW: [PATCH] Stop packet_recv greediness, stop using event_add_timed

Tue May 19 11:31:40 GMT 2009

On Tue, May 19, 2009 at 11:51:14AM +0100, Sam Liddicott wrote:
> > Sorry for that, but I think my query is still valid. You're
> > seeing a problem with the policy to read everything in a
> > greedy way. I *think* this has been added due to performance
> > reasons. 
> Maybe, but as you mention below, you suspect that the gains will be lost
> in the noise; and the quest for such minimal gains results in the sort
> of bad behaviour I'm trying to fix, as well as the potential DOS you
> mention.
> 
> It is worth addressing the performance gains of different solutions, but
> not in comparison to the current behaviour which is broken, and
> unnecessarily corks server responses in order to save a few syscalls -
> i.e. it wastes network capacity in favour of cpu capacity, to a
> noticeable degree.

I'm trying to get that torture test because *I* myself would
have to go through that if I wanted to apply such a change
to the core of Samba 4.

> For real numbers obtained during debugging here, a sequence of 5
> responses which would normally have been sent out around 10ms apart were
> being collected and set out at once 10us apart. That means the first
> response was being delayed by 40ms, which more than doubles the
> effective RTT for many WAN, or wastes around 500Kbytes of data
> opportunity at LAN speeds.

Wait a second -- this sounds like a problem with your TCP
socket options. This definitely sounds like you want to turn
off NAGLE.

> > If you would be happy with just reading a packet at
> > a time with 4 syscalls (epoll -> read 4 bytes -> epoll ->
> > read rest) this might be slower than the 3 syscalls (epoll
> > -> ioctl(FIONREAD) -> read everything). 
> I was never proposing anything as bad as that, there is no need to
> return to epoll after 4 bytes at all.

Seriously: I would definitely propose exactly that:
epoll->read4->epoll->readrest. This is conceptually the
simplest of all schemes, and would probably fully solve your
problem. But to actually get that through, I would want
those performance numbers.

Probably all these mails at some point get me doing your
work and provide that trivial 100-liner myself, I should
probably have spent reading those mails doing it and I would
already have it by now :-(((

> > I think the discussion
> > will become much easier. If you get a packet rate loss below
> > noise, it's done I think. If you get a measurable packet
> > rate loss, we have to discuss efficiency vs code clarity.
> >   
> I don't understand what you mean by packet loss here, I don't expect it
> to introduce any kind of packet loss.

I wanted to talk about "loss in packet *RATE*", not packet
loss. Sorry if my english was not sufficient.

> I'm not proposing as many epolls as the "4 syscall" scheme though.

Again: Why not? If this turns out to be fast enough, why not
have much simpler code that is easier to understand?
Deferring all the buffering work to the kernel *might* not
be as bad performance-wise as we all suspect.

> Not really a randomiser although it may serve.
> 
> With a server socket current implementation already prefers to write
> than to read (apart from this problem that contradicts it) - this is so
> that the server gets to send the responses it worked so hard to generated.
> smbd/service_stream.c/stream_io_handler()
> 
> With client sockets we prefer to read than to write, this is because it
> is polite to the server to listen to the answers instead of just queuing
> more, and if we don't, we may deadlock if it decides to stop responding.

Even in the client you want to get rid of data you generated
as quickly as possible. It's the smbclient application that
actually triggered that change in Samba 3.  If you want to
do multiplexed read&x you are better off sending as many
requests to the server as possible and rely on the kernel
queues to sort it out. Otherwise you will end up with
oscillating traffic which sucks for bandwidth utilization.

> However, when we have a related server/client socket pair (e.g. when
> server requests sometimes generate new client requests of any kind),  we
> have this priority scenario, which epoll doesn't support as it keeps
> fd's seperate:
> 
> 1. send server responses
> 2. receive client responses
> 3. send client requests
> 4. receive server requests
> 
> This effectively pushes any bottleneck back to the client, instead of
> buffering dangerously in the server.
> 
> There is no point in receiving server requests if resultant client
> requests are being blocked because the upstream server is slow (or
> blocking because we are not receiving our client responses) or because
> we are just storing them in ram - so server requests is the lowest
> priority - thus throttling our client to our available processing rate.
> 
> There is no point though in sending client requests if we are not
> listening to the client responses or we risk deadlock.
> 
> There is no point in receiving client responses if we just fill up the
> our server send queue.

Ah, ok, I see what you mean. You're coming from your proxy
perspective. In your case I think you definitely want to
read all your proxy client's requests into user space,
regardless of what the server you're proxying does at this
moment: The magic word is SMBEcho requests. If a server is
stuck for extended periods of time for any reason, clients
start sending SMBEchos. You definitely want to fish them out
of the client request floods, and reply to them locally.
There is no other way than to just read everything and
filter on SMBEcho.

Volker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20090519/fc25fac1/attachment.bin