management of Samba4

Fri Jun 3 03:10:43 GMT 2005

After our discussions in the web server thread, I've been thinking a
bit more about the low level details of management and monitoring
interfaces in Samba4. 

To try to crystalise things a bit, the specific bits of information I
have been trying to work out how to report are:

  - lists connected users
  - lists of open files in the ntvfs backends
  - counts of packet types in the nbt server

This is only a tiny subset of what we want to be able to report, but
its acts as a good basis for thinking about the types of solutions we
could use. Each of the 3 bits of information above illustrates a type
of problem we need to solve.

The goals I want to meet are:

  - should be easy to add new sources of information in the server
    code
  - should be accessible from js scripts
  - should not reduce performance when not being used
  - should work on all our supported platforms

The 3rd point (about performance) is particularly tricky. For example,
if we decided to put information like counts of packet types in a tdb
or ldb then when should we update the db? Most of the time the
management interfaces are not being used, so writing information out
to a db on a periodic basis is no good, as it costs us performance
even when we aren't using the information. 

If instead we put it in a shared memory area then we have a potential
bottleneck with the locking on that shared memory, plus we can't
guarantee that all our supported platforms have shared memory.

The 'lists of open files' problem also presents its own problems. In
Samba3 we got that information by probing the locking.tdb file. That
was fine when we had only a single filesystem backend, and that
backend was always local, but it doesn't really suit the ntvfs
architecture in Samba4. A ntvfs backend might not want to use a tdb at
all for its open files table, as the share mode information might be
handled directly in the kernel filesystem, or might even be stored on
some remote server. Requiring that the backend also write the
information to a local db just so that management tools can see it
breaks the 'should not reduce performance when not being used rule'.

It makes much more sense for a ntvfs backend in Samba4 to provide a C
function in the ntvfs methods table which implements a call like 'give
me a list of open files'. That call might (for example) walk a linked
list, or even make a remote rpc call to a meta data server in a
clustered filesystem. The point is that our management tools don't
know (and should not care about) how the backend gets the information.

The 'counts of packet types in the nbt server' is an example of
information that is stored in a C data structure in the server, and
doesn't really naturally fit in any db format. It would be silly for
the nbt server to have to update a db on each packet, and using the
usual tricks of only doing it every N packets, or only updating every
N seconds just degrades the accuracy of the information we provide,
while also still leaving us with some degree of overhead when the
management interfaces are not being used. 

In some ways the nbt packet counts is easier than the smbd 'list of
open files' as at least the information is isolated to a single
process. In the case of the open files list the information may be
spread out over 1000 separate smbd processes. 

The 'lists of connected users' problem is also spread out over all of
our smbd processes. In Samba3 we used a tdb to hold this information,
and that might also make sense in Samba4 as there is no abstraction
layer for user connections to worry about, and a new user connecting
is much less common than a new file being opened, so the overhead of a
db write on user connection/disconnection is less of a convern.

So, how can we put together a framework for gathering this type of
information while meeting the above goals? I'd like to propose the
following:

 - using our smbd messaging layer (the one based on unix domain
   datagram messages) as the basic transport. 

 - add a NDR encoded layer on top of this transport

 - use IDL to write the structures that will be accessible, allowing
   pidl to generate the marshalling/unmarshalling code for these
   structures.

 - use the pidl ejs backend that jelmer has started on to provide
   interfaces to these functions from ejs

 - add the ability to broadcast a message to a class of smbd tasks
   (for example, to all smb_server tasks).

I know this sounds a bit like reinventing the ncalrpc transport, but
it isn't quite. For a start, ncalrpc is over connected stream sockets,
whereas I want to use unix domain datagrams as otherwise we will run
out of file descriptors when getting results from 1000 smbd
processes. Plus, I don't want to have to setup a full rpc server in
each smbd task, instead I want code to be able to do something like:

  smsg_register_handler(SMSG_LIST_OPEN_FILES, my_function);

in source/ntvfs/posix/vfs_posix.c and have my_function() be called
with the unmarshalled structures all setup, and be able to send reply
data by filling in the .out sections of the passed structure.

It will really be more like a ncadg_ interface, but without any of the
bind stuff and with awareness of smbd sub-tasks and how to talk to
them.

As an initial goal, I'd like to write the equivalent of the Samba3
smbstatus command as an ejs script. That should allow the framework to
be tested quite well, and will naturally slot into an element of the
web management interface once done (as the same ejs functions will be
used).

The low level messaging code in lib/messaging/ will need a bit of
work to support all this stuff. In particular it currently uses a
connected unix datagram socket, which means it creates a full file
descriptor for each message. That is no good when you want to send off
thousands of these messages, so I'm going to see if I can make it work
with socket_sendto() (which means adding that to
lib/socket/socket_unix.c). That should also make it faster.

For platforms without unix domain datagram sockets, they would need to
write an alternative messaging system with the same api as
lib/messaging/ and abstract the code to allow the transport to be
selected (at either compile time or runtime).

At the end of all this we should have a really good system for
allowing any internal data structure in any part of smbd to be easily
exposed to our management interfaces, while still scaling well, and
not incurring any overhead when management calls are not being
used. It should then be an easy task to write the individual pages of
the web management interface, and (if desired) write command line
versions in ejs.

Cheers, Tridge