frustration Re: FW: [PATCH] Stop packet_recv greediness,
stop using event_add_timed
sam at liddicott.com
Tue May 19 10:51:14 GMT 2009
I realise that my responses tend to get very long, but I'm trying to
address a misunderstanding that occurred from having written shorter
messages. Please forgive my wordiness.
Specifically, I think that
1. the performances reductions of the patch are over-estimated
2. the breakage of current behaviour is underestimated and don't warrant
the minor savings
3. the timings required are not warranted, and certainly not clear.
3. the pickiness is often over trivialities
I will address these points in the message below, but before that I need
to make these observations.
It takes me 5-10 times longer to argue for a patch than to produce it
(yes, I've spend much of the morning trying to write this message
nicely); and then it is often not accepted through reasons which are not
clear to me. And these are trivial patches!
I hope that we can consider if this time is well spent, if that much
work is required to prevent Samba quality from going to hell at my hand,
or if this is just over-cautiousness.
I leave to you the perogative to accept or reject patches based on mere
idle whims, but my point is that I already have the patches and the
benefits, and I'm spending my employers time trying to help the samba
project, and speaking from my employers view, it's not just poor value,
it's a total waste of my time if a patch is ultimately rejected.
Consider a recent case where a patch was rejected because it followed
the published samba coding style, and was also compliant with other well
read C style guides, as well as compliant with much recent samba code.
Instead I had to match some recently revised unwritten samba coding
style which goes against other well read C style guides advice on
I don't say any of this is malicious, but it is real and very frustrating.
Consider more distantly when I offered a patch: "Add --with-ext-lib...."
where a simple "within the current scheme of things" patch became a "fix
an autoconf problem" for a Samba team member - which I stepped up and
did to his satisfaction, only to have it rejected because it failed to
meet the contrary expectations of another Samba team member.
To be more particular with this patch today, a team member who I respect
doesn't like the patch because in an effort to retain the so called
performance benefits it uses a -1 byte trick that someone else gets
wrong. Maybe it's not very pleasant, but there's only so many ways to
solve the problem under such constraints.
And so, I provide a patch that avoids the -1 byte and addresses the DOS
problem. But note, we can only address the DOS problem if we don't get
the efficiency savings (which I don't believe in anyway).
Aseop sum's up my feelings well:
And I'm about to shoot the donkey.
Now on with discussion of this patch... and while the message below may
seem to express great frustration, it is not directed at any individual.
* Volker Lendecke wrote, On 19/05/09 09:41:
> On Tue, May 19, 2009 at 07:43:35AM +0100, Sam Liddicott wrote:
>> /me bangs head on floor.
> Sorry for that, but I think my query is still valid. You're
> seeing a problem with the policy to read everything in a
> greedy way. I *think* this has been added due to performance
Maybe, but as you mention below, you suspect that the gains will be lost
in the noise; and the quest for such minimal gains results in the sort
of bad behaviour I'm trying to fix, as well as the potential DOS you
It is worth addressing the performance gains of different solutions, but
not in comparison to the current behaviour which is broken, and
unnecessarily corks server responses in order to save a few syscalls -
i.e. it wastes network capacity in favour of cpu capacity, to a
For real numbers obtained during debugging here, a sequence of 5
responses which would normally have been sent out around 10ms apart were
being collected and set out at once 10us apart. That means the first
response was being delayed by 40ms, which more than doubles the
effective RTT for many WAN, or wastes around 500Kbytes of data
opportunity at LAN speeds.
Can the extra syscalls in CPU time really cost this much in RTT or lost
> If you would be happy with just reading a packet at
> a time with 4 syscalls (epoll -> read 4 bytes -> epoll ->
> read rest) this might be slower than the 3 syscalls (epoll
> -> ioctl(FIONREAD) -> read everything).
I was never proposing anything as bad as that, there is no need to
return to epoll after 4 bytes at all.
The patch supplied did:
epoll -> read everything -1
which in the worst case of a single packet being queued came to:
epoll -> read everything -1 -> read 1
and in the case of multiple packets being queued came to:
epoll -> read everything -1
epoll -> (no need to read)
epoll -> (no need to read)
epoll -> read 1
I was never returning back to epoll more often than once per packet;
previously this was:
epoll -> read everything
timer -> (no need to read)
timer -> (no need to read)
timer -> read 1
and while the timer wasn't a syscall, it is the behaviour that was
breaking things and introducing DOS potential, by giving reads higher
priority than writes through abusing 0 timeout timers.
Possibly we could avoid the extra epoll if we determine that the send
packet queue is empty, but how likely is it that we should have a lot of
requests and never any responses? We would most likely have a response
for each request and so it hardly seems worth the complexity this would add.
Out of respect for A Bartlett's feedback (to avoid the -1 byte which
troubled him) I attach a new patch that specifically take the approach
/similar/ to that you suggest above, in that I read 4 bytes, but I still
do not return back to epoll unless I also cannot read the entire packet.
It is less efficient than the previous patch in that we do a read
syscall more often (at least once per packet) and FIONREAD ioctl more
often (but we can cache the value of some of those if it troubles you).
So this addresses two problems
1. to let sent packets hit the wire ASAP
2. throttle the tcp connection based on how fast we can process the packets
This new patch is less efficient than the previous one, but it avoids
the DOS potential - and there is no other way to push rate limiting back
onto TCP than to use epoll or select to determine if we can send more data.
> So if you can prove
> how much the latter scheme gains,
do you mean how much the 4 byte scheme gains over the previous -1
scheme, or how much it gains over the current scheme?
> I think the discussion
> will become much easier. If you get a packet rate loss below
> noise, it's done I think. If you get a measurable packet
> rate loss, we have to discuss efficiency vs code clarity.
I don't understand what you mean by packet loss here, I don't expect it
to introduce any kind of packet loss.
>> As for the speed comparison, do you want a comparison of a regular
>> server process doing certain tasks or do you want the timings for a pure
>> read loop reading a few megabytes of samba packets and doing no other
> I would prefer a micro-benchmark just doing the packet
> feeding into one direction with a VERY simple tbench-style
> program. One direction because only this should in theory be
> the best case for the greedy read. That benchmark should do
> packet parsing just up to a level where you actually dissect
> the different netbios session packets, because that is what
> the 4 syscall scheme also gives you.
> I would suspect if you put the full server behind it, the
> loss will be way below the noise.
I'm not proposing as many epolls as the "4 syscall" scheme though.
And the fact that it will get lost in the noise is the main point. Any
"performance gain" (i.e. slurping) which gets lost in the noise and
which introduces DOS potential and avoids sending responses in
preference to reading more requests should be taken out regardless, surely?
>> Ultimately we suspect that libevent needs reworking so that it can
>> consider the state of all sockets and timers and then deliver them in a
>> global priority order, instead of taking each fd in turn.
> Are you sure you want a real priority scheme? I would guess
> that assuming epoll does not do randomizing or proper
> queueing, what you want is a randomizer over all readable
Not really a randomiser although it may serve.
With a server socket current implementation already prefers to write
than to read (apart from this problem that contradicts it) - this is so
that the server gets to send the responses it worked so hard to generated.
With client sockets we prefer to read than to write, this is because it
is polite to the server to listen to the answers instead of just queuing
more, and if we don't, we may deadlock if it decides to stop responding.
However, when we have a related server/client socket pair (e.g. when
server requests sometimes generate new client requests of any kind), we
have this priority scenario, which epoll doesn't support as it keeps
1. send server responses
2. receive client responses
3. send client requests
4. receive server requests
This effectively pushes any bottleneck back to the client, instead of
buffering dangerously in the server.
There is no point in receiving server requests if resultant client
requests are being blocked because the upstream server is slow (or
blocking because we are not receiving our client responses) or because
we are just storing them in ram - so server requests is the lowest
priority - thus throttling our client to our available processing rate.
There is no point though in sending client requests if we are not
listening to the client responses or we risk deadlock.
There is no point in receiving client responses if we just fill up the
our server send queue.
-------------- next part --------------
An embedded message was scrubbed...
From: Sam Liddicott <sam at liddicott.com>
Subject: [PATCH] Stop DOS attack via packet_recv greediness
Date: Tue, 19 May 2009 10:09:21 +0100
More information about the samba-technical