fix to util_sock.c

Kenichi Okuyama okuyamak at
Tue Nov 14 04:37:36 GMT 2000

Dear Andrew,

>>>>> "AT" == Andrew Tridgell <tridge at> writes:
AT> There is even reason to believe that recv() with MSG_WAITALL will
AT> actually be slower than read() in some circumstances - think about
AT> what might happen to the TCP receive window with a MSG_WAITALL on 64k
AT> of data. An increased likelyhood of TCP stalls is very plausible,
AT> although I have not done a test to see if this happens.

That's simply because TCP layer will not be implemented that way.
If you wait with MSG_WAITALL, tcpip layer will still be invoked
for every packet. For every packet, data is copied to user space
buffer, then (since not enough data is recieved) will put himself
back to semaphore again.

If you use read() like we do, what will happen is,

1) you dive into the recv().
2) you'll reach to tcpip layer.
3) look around for next packet, and fail.
4) set yourself to semaphore of that socket, and go for reschedule.
5) re-invoked, and start looking for packet,
6) you'll find packet,
7) go back to user space (smbd)
8) find that you didn't recieve enough, goto 1 again.

With MSG_WAITALL option , you can cut most of the step 1, 2, and 7.
Also, because you run 8 on kernel side, you can reduce Context
switch. Also, this will raise cache miss ratio too.

There is no reason this will cause slower.

What will cause slower because of this implementation, is that now,
samba will STUPIDLY wait for entire SMB request data stream to
arrive, while we can do many things even with only first four bytes.

While we are doing something else ( any stupid thing is okey ), we
are increasing chance of next packet already arrived. Since it is
clear that what takes longest in searching for the next packet, is
when there's no packet for you, if you can wait for packet without
even calling recv(), this will become lightest. If you can do
something else Out Of Ordered, then speed of reply will raise, and
throughput will gain.

But to have this performance, we need MSG_WAITALL functionality,
so that, even when we have not got enough packet, still we can
make our waiting in smallest cost "FOR KERNEL".

AT> If someone really wants to investigate this then I suggest they modify
AT> tbench to try the proposed strategy and post results. By using the
AT> tbench harness we will be able to eliminate many of the complicated
AT> factors that affect overall performance of a SMB server and
AT> concentrate on the TCP usage only. See the dbench CVS module on
AT> for a copy of tbench.

I disagree. tbench will not give you the information INSIDE kernel.
And that's what we should focus for using recv().
Kenichi Okuyama at Tokyo Research Lab, IBM-Japan, Co.

More information about the samba-technical mailing list