frustration Re: FW: [PATCH] Stop packet_recv greediness, stop using event_add_timed

Stefan (metze) Metzmacher metze at
Tue May 19 13:02:58 GMT 2009

Volker Lendecke schrieb:
> On Tue, May 19, 2009 at 11:51:14AM +0100, Sam Liddicott wrote:
>>> Sorry for that, but I think my query is still valid. You're
>>> seeing a problem with the policy to read everything in a
>>> greedy way. I *think* this has been added due to performance
>>> reasons. 
>> Maybe, but as you mention below, you suspect that the gains will be lost
>> in the noise; and the quest for such minimal gains results in the sort
>> of bad behaviour I'm trying to fix, as well as the potential DOS you
>> mention.
>> It is worth addressing the performance gains of different solutions, but
>> not in comparison to the current behaviour which is broken, and
>> unnecessarily corks server responses in order to save a few syscalls -
>> i.e. it wastes network capacity in favour of cpu capacity, to a
>> noticeable degree.
> I'm trying to get that torture test because *I* myself would
> have to go through that if I wanted to apply such a change
> to the core of Samba 4.
>> For real numbers obtained during debugging here, a sequence of 5
>> responses which would normally have been sent out around 10ms apart were
>> being collected and set out at once 10us apart. That means the first
>> response was being delayed by 40ms, which more than doubles the
>> effective RTT for many WAN, or wastes around 500Kbytes of data
>> opportunity at LAN speeds.
> Wait a second -- this sounds like a problem with your TCP
> socket options. This definitely sounds like you want to turn
> off NAGLE.
>>> If you would be happy with just reading a packet at
>>> a time with 4 syscalls (epoll -> read 4 bytes -> epoll ->
>>> read rest) this might be slower than the 3 syscalls (epoll
>>> -> ioctl(FIONREAD) -> read everything). 
>> I was never proposing anything as bad as that, there is no need to
>> return to epoll after 4 bytes at all.
> Seriously: I would definitely propose exactly that:
> epoll->read4->epoll->readrest. This is conceptually the
> simplest of all schemes, and would probably fully solve your
> problem. But to actually get that through, I would want
> those performance numbers.
> Probably all these mails at some point get me doing your
> work and provide that trivial 100-liner myself, I should
> probably have spent reading those mails doing it and I would
> already have it by now :-(((

I think the long term goal should be to use the tstream api
and drop the packet_*() api. The packet_*() api has far to many knobs
now and it's too complex. With the tstream it should be easy to
implement the read 4 => read rest logic, while it optimizes out the
epoll calls.

I hope I'll have time to update the documentation from the tsocket api
to the much simpler tstream api.

For now I'll just apply your patch.

>>> I think the discussion
>>> will become much easier. If you get a packet rate loss below
>>> noise, it's done I think. If you get a measurable packet
>>> rate loss, we have to discuss efficiency vs code clarity.
>> I don't understand what you mean by packet loss here, I don't expect it
>> to introduce any kind of packet loss.
> I wanted to talk about "loss in packet *RATE*", not packet
> loss. Sorry if my english was not sufficient.
>> I'm not proposing as many epolls as the "4 syscall" scheme though.
> Again: Why not? If this turns out to be fast enough, why not
> have much simpler code that is easier to understand?
> Deferring all the buffering work to the kernel *might* not
> be as bad performance-wise as we all suspect.
>> Not really a randomiser although it may serve.
>> With a server socket current implementation already prefers to write
>> than to read (apart from this problem that contradicts it) - this is so
>> that the server gets to send the responses it worked so hard to generated.
>> smbd/service_stream.c/stream_io_handler()
>> With client sockets we prefer to read than to write, this is because it
>> is polite to the server to listen to the answers instead of just queuing
>> more, and if we don't, we may deadlock if it decides to stop responding.
> Even in the client you want to get rid of data you generated
> as quickly as possible. It's the smbclient application that
> actually triggered that change in Samba 3.  If you want to
> do multiplexed read&x you are better off sending as many
> requests to the server as possible and rely on the kernel
> queues to sort it out. Otherwise you will end up with
> oscillating traffic which sucks for bandwidth utilization.

Yes, each socket needs to make sure it handles sending with a higher
priority than reading.

>> However, when we have a related server/client socket pair (e.g. when
>> server requests sometimes generate new client requests of any kind),  we
>> have this priority scenario, which epoll doesn't support as it keeps
>> fd's seperate:
>> 1. send server responses
>> 2. receive client responses
>> 3. send client requests
>> 4. receive server requests
>> This effectively pushes any bottleneck back to the client, instead of
>> buffering dangerously in the server.
>> There is no point in receiving server requests if resultant client
>> requests are being blocked because the upstream server is slow (or
>> blocking because we are not receiving our client responses) or because
>> we are just storing them in ram - so server requests is the lowest
>> priority - thus throttling our client to our available processing rate.
>> There is no point though in sending client requests if we are not
>> listening to the client responses or we risk deadlock.
>> There is no point in receiving client responses if we just fill up the
>> our server send queue.
> Ah, ok, I see what you mean. You're coming from your proxy
> perspective. In your case I think you definitely want to
> read all your proxy client's requests into user space,
> regardless of what the server you're proxying does at this
> moment: The magic word is SMBEcho requests. If a server is
> stuck for extended periods of time for any reason, clients
> start sending SMBEchos. You definitely want to fish them out
> of the client request floods, and reply to them locally.
> There is no other way than to just read everything and
> filter on SMBEcho.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url :

More information about the samba-technical mailing list