SMB3 and RDMA support

Sun May 18 12:39:58 MDT 2014

Hi Richard,

On 2014-05-18 at 09:32 -0700, Richard Sharpe wrote:
> On Sat, May 17, 2014 at 11:55 AM, Stefan (metze) Metzmacher
> <metze at samba.org> wrote:
> > Am 17.05.2014 11:22, schrieb Michael Adam:
> >> On 2014-05-16 at 03:15 -0700, Richard Sharpe wrote:
> >>> After discussing things with Michael Adam at the bar camp, I think we
> >>> now have an approach to SMB Direct (SMB3 over RDMA) that is workable,
> >>> so I anticipate working with the stuff that Metze and Michael are
> >>> doing and helping to get it moving.
> >>
> >> Right, I will follow up with details soon (txt, wiki, git).
> >> I'm looking forward to collaborating and getting this up to speed!
> >
> > I just found that there's a ibv_fork_init() function.
> > Maybe there's some hope that we don't need an external daemon.
> 
> I am not sure which external daemon you are referring to.
> 
> The problem is that every client gets a separate daemon.

> Anyway, attached is a diagram of what I think Michael was talking
> about at SambaXP.
> 
> The steps that he described to me seem to be:
> 
[copied below with comments]
> 
> So, one question is my mind is:
> 
> Why do we need a separate smb-dd? Surely, the master can also handle
> RDMA SEND/RECV and RDMA READ/WRITE and running the small SMB Direct
> protocol?
> 
> Is there a strong reason for that extra daemon?

This is the rdma-proxy-daemon (I called smbd-d, i.e.
smb-direct-daemon for want of a better name).

With the design we discussed, this smbd-d is only process that
listens to rdma and answers rdma requests.

We would like to omit that and treat rdma connections like other
smbd-child processes, i.e. have main-smbd listen on rdma and fork
a child process when a new rdma connection is accepted and
fd-pass them to the already existing smbd process for the client.

But we assumed up to now that the rdma client libs don't
support forking with established connections.  Hence the need
for a single daemon (not forking children) to listen for RDMA
connections and do the acual rdma traffic with all clients:
the rdma-proxy-daemon.

In theory, I guess the main smbd could also take this task,
but we wanted to make it a separate daemon so that the main
smbd would not be prevented from doing its accept
connection/fork business. And also, this separate rdma-proxy
could be a fully multi-threaded daemon, thereby partly
compensating for the lack of forking.

Now instead of having this bottleneck single proxy daemon, we
would like to have a daemon that forks children for new
connections and the fork function that might help here.
(Btw, a prefork model might also just do the trick!!!...)

But since we can make the single rdma-proxy-daemon
multi-threaded, I am not certain how much going multi-process
gains us if we can't fd-pass the rdma connection to
another smbd for further treatment. Forking without fd-passing
gives us one rdma-proxy-daemon per rdma-connection. The ideal
solution would (or might) be the solution where there is no
proxy daemon at all any more and the rdma traffic is done by
the initial smbd process for the client.

So let me now comment on the steps you listed:

0. main smbd is started.
> 1. Not longer after starting, the master smbd forks an SMB Direct
> daemon, smb-dd.

Well the smbd-d might also be an independent daemon process,
maybe started by main smbd, not certain yet. The smbd-d should
definitely be a small special-purpose process, not a full
smbd-child.

Now the setup is this:

- smbd-d listens for rdma connections.
- smbd listens for tcp connections on smb ports.

> 2. A TCP connection comes in to the master
>
> 3. The master forks

child smbd process "c1"

> after it accepts and protocol processing happens
> in the child. The TCP connection is closed in the master but kept open
> in the child, so in some sense it is transferred to the child.
>
> 4. An RDMA connection comes into the smb-dd. It is accepted and the
> SMB Direct protocol starts.
>
> 5. The smbd-d forks an smbd (the smbd-d is really just an smbd
> anyhow), and communicates with this child via a UNIX-domain socket
> (transfers the SMB requests and responses via this UNIX-domain
> socket.)

No! The smbd-d does not fork a child. (At least not in the
approach we discussed since we assumed that libibverbs can't
cope with forking established rdma connections.)

Instead, it sends a message with the information about the new
rdma client connection to main-smbd. From here on it is a
little different:

6. main smbd forks a new child "c2" when receiving the
   message from the smbd-d.
   c2 establishes a unix domain socket connection to smbd-d
   this is a communication and proxy channel.
   Over this communication channel, it gets the smb
   information from the connected rdma client, including the
   client GUID.

   c2 finds that c1 already serves the client GUID and
   then transfers (even befor an rdma session bind)
   the unix socket (connecting to smbd-d) to c1 and
   then c2 dies.

7. c1 completes protocol processing (session bind, ...)
   and also establishes a mmap area with smbd-d.

8. Thereafter smb-direct-requests are proxied through
   the proxy-unix-domain connection, i.e. rmda
   send/recv calls are send over the socket from
   smbd-d to c2, which sends the answer back.
   And rdma write/read requests are treated as
   proper rdma requests which use the mmap area
   to access memory from c1.

Makes sense?

Cheers - Michael

> 6. When we get the SessionSetup in the second child smbd we use the
> client GUID to find the associated smbd (perhaps via a TDB, or
> whatever) and we then transfer the UNIX-domain socket to the first
> child smbd.
>
> 7. Thereafter, the first child uses that UNIX-domain socket to
> communicate with the smb-dd. It receives simple SMB requests and sends
> simple responses via the socket while read and write data is
> communicated via a shared-memory area.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20140518/93241f73/attachment.pgp>