PATCHES: Release Suggestion

Sun Nov 12 12:07:00 GMT 2000

Kenichi,

While I agree that there are many aspects to the global design of
Samba that are poor, your specific example is a bit flawed I think.

> # Ex.: I know that current Samba is replying slowly against
> #      clients request.

it is? 

> This is because current Samba is doing
> #      1) recieve ENTIRE request,
> #      2) do what ever is required on reply,
> #      3) then actually send reply.
> #      But if you focus on step 2, you'll find there are many
> #      things you can do only with first 4 bytes ( if you
> #      do not run construt_reply_common() before switch_message() ).

no, you can do very little with the first 4 bytes. It only gives you
the request length, and that isn't enough to make a decision.

> #      This change require global change, for each reply_*() function
> #      is required to have parameter of
> #      i) file descriptor against Client for recieving rest part of
> #         request later.
> #      ii) input buffer, it's size, already recieved size, and
> #          size it should recieve.
> #      iii) output buffer, it's size.
> #      And also, each reply_*() should be changed.

arrgh, no way. I doubt very much this will help at all and it will
make our internals much more complex. You can convince me otherwise,
but only with a practical demonstration. I'm highly skeptical that it
will help.

> #      This will make samba to work lot better, because of two reason:
> #      1) while you're running into individual reply function,
> #         there's high possibility of next packet arriving.

This only applies with SMBwrite* requests. All other time critical
requests are smaller than 1.5k in size, and MS clients tend to wait
for the current request to be satisfied before sending a 2nd request. 

In the case of SMBwrite*, the only one which really matters is
SMBwritebraw as MS clients negotiate small buffer sizes (around
4k). So if you want to test your theory then special case SMBwritebraw
and see if it helps. I suspect that when numclients >> numcpus you
won't get much of an advantage at all (and may in fact lose
performance).

> #         If client sends request with 100Mbps ethernet,
> #         2nd IP packet for the request will arrive within 1/7300 sec.
> #         which is about 1.3msec. If it's 1Gbps, then it's 130usec.
> #         this is small enough amount of time to wait with doing
> #         something else, instead of simply calling recv()/read()
> #         and ask for blocking inside ( once you ask for blocking,
> #         unix system usually will not araise you for 10msec, you
> #         know ).

a unix system will only not schedule you for 10msec if it has
something else to do, which is fine for us. If it has something else
to do then let it do it. We do not care about the latency of
individual requests, we care about overall system throughput.

> #      2) If request was something like SMBwrite, now we can call
> #         transfer_file() from first packet. And since transfer_file()
> #         can be implemented using sendfile() system call, we can have
> #         better scheduling against kernel ( well, if you have
> #         sendfile() as system call, though )

On Linux at least, sendfile() is useless for writes to the server. The
API just isn't designed for it. It perhaps should be renamed to
http_sendfile() because it is so oriented towards that particular
protocol. It has the following problems:

- error handling is no good. For output, how do you cope with EOF on
  the file? On input, how do you cope with out of disk space?

- the man page for Linux at least says the input file pointer is not
  changed. Thats the socket for the SMBwrite case! That would be
  fatal. I believe sendfile only works with a real file as the in_fd
  in Linux - does it work in both directions on some other OS? Does it
  handle errors sanely?

sendfile() is OK for http because http has the peculiar property that
an acceptable action on detecting an error is to drop the socket. That
is most definately not acceptable for SMB. 

It probably would be possible to come up with a new API that is more
efficient for SMB than read() and write(), but first we need some
reason to think that socket IO is actually the bottleneck and that the
new API would fix it. In the benchmarking I've done I've yet to come
across a system where the socket API is the bottleneck, which is why
I'm not really interested in doing this sort of kernel API
development. 

> #      But this entire is change aginst framework, it effects all
> #      rpc reply functions, and since so, we can't shoot that large,
> #      still increasing, reply functions. So we can't make change against
> #      the part.

You can shoot that large if the aim is good :)

We are certainly happy to consider very large changes, but large
changes have to be very well thought out, then prototyped and shown to
actually be worthwhile. Right now Jeremy is trying to get 2.2 ready
and so he is aiming for small changes. I'm interested in hearing about
large proposed changes for 3.0 and beyond, but they have to be well
thought out.

You're comments on us having a bad structure that we are standing on
are to a large degree correct. The problem we face is that we want to
make these big structural changes smoothly, so that we can keep doing
stable releases from a code base that is fairly close to our
development code base. That doesn't preclude large changes (witness
the wholesale change to an internal database api in Samba 2.2) but it
does mean we have to think carefully about big changes. Big changes
imply a long time between stable releases, and that is something we
want to minimise if we can.

As for the rpc framework, what we really want to move to is to rebuild
the rpc subsystem on top of a code generator using IDL files. Several
people are looking at various aspects of that, but right now we are
not certain which approach we will take. 

Cheers, Tridge