socket-wrapper fd-passing

Mon Jun 29 14:20:02 UTC 2020

Hi Anoop,

>> I rebased your fd-passing patches on top of socket_wrapper master.
>>
>> See https://gitlab.com/metze/socket_wrapper/-/commits/fd-passing/
> 
> Thanks and sorry for my late reply.
> 
> See my comments below:
> (WIP branch: 
> https://git.samba.org/?p=anoopcs/socket_wrapper.git;a=shortlog;h=refs/heads/fd-passing-final
> )
> 
>> Please also notice my commit on top where I added some more hints
>> on a better design:
>>
>>   - We need to maintain a small file using mmap and protected
>>     by pthread robust mutexes. E.g. one file per local ip address.
> 
> My current set of patches operate out of a single file(as db) shared
> using mmap() protected by pthread robust mutexes. Do you foresee any
> complications with just one file as common db?

I'm not sure if a single file would be good for all processes of
samba's autobuild.

>>   - The path specified in SOCKET_WRAPPER_FD_PASSING_DB will
>>     be used as the file name, if this is not specified we'll
>>     use malloc'ed and fd-passing is not enabled.
> 
> As of now I have a hard coded db name SOCKET_INFO_DB defined as
> /tmp/socket-info.db to hold socket_info structures, free list pointers
> and mutex. But the fall back mechanism is not yet present.

At least we should not have a hard coded path under /tmp,
but first we need to make something that actually works.

>>   - The file contains a header (with magic, unique id, size and
>> free-list pointer)
>>     followed by an array of socket_info structures.
> 
> More or less similar format has been used so far. But remember,
> version/magic, size etc are still marked as TODO.
> 
>>   - The socket_info_fd structures will only maintain the index
>>     into the mmap'ed array.
> 
> socket_info_fd is now an array of integer pointers containing indexes
> to mmaped area of socket_info structures.

Yes, maybe it's easier to keep it that way...

>>   - fd-passing is limited to fixed number (~ 127), this should be
>>     more than enough for typical caller (Samba would just use 1).
> 
> Ok.
> 
>>   - In order to do fd-passing of tcp/udp sockets, we'll create
>>     a pipe (or similar) where we write an array of with indexes
>>     into the mmap'ed array into the write end of the pipe.
>>     We would also pass the device/inode and a unique identifier
>>     for the file.
>>     The read end of the pipe is then passed as the last fd to
>>     the destination process. The destination process can rebuild
>>     the socket_info_fd structures by reading the array indexes.
>>     out of the read end of the pipe.
>>   - A tricky part will be the reference counting in the database
>>     entries. The sender needs to write the data into the pipe
>>     and increment the refcounts in the db file before calling
>>     sendmsg(). The sender may hold a mutex for each socket
>>     during sendmsg().
> 
> I hope this is based on your suggestion to have one file per ip
> address. My WIP branch contains changes based on a similar design from 
> https://git.samba.org/?p=anoopcs/socket_wrapper.git;a=blob;f=TODO.fd-passing;h=95f5107c0601720d9a6df81159976363adf7b52b;hb=refs/heads/fd-passing-final
> 
> final implementation of real fd-passing is still at an infant stage :-/

I had a look at your branch, but it seems you tried to catch SCM_RIGHTS
at the wrong place (in the code that handles inet sockets).

We need to catch the if (si == NULL) return libc_sendmsg() case that's
used for unix domain sockets.

>>   - In order to allow multiple threads (or processes) to share a
>> single
>>     socket we need to add mutex protection in quite a few places.
> 
> yes..yes, we have decent amount of pthread mutexes now :-)
> 
>>     In the most common cases there won't be any contention on the
>> mutexes,
>>     but it will garantee correctness for the corner cases which
>> happens
>>     for fd-passing.
> 
> Sure.
> 
> In short, I could successfully run `make test` with changes to work out
> of a common shared file. Still there is one
> test(test_thread_echo_tcp_sendmsg_recvmsg, which I added) failing
> almost consistently on my local system and not on GitLab CI. I have
> been trying to root cause this failure for quite sometime now(as and
> when time allows me) without any luck in fixing it. I would love to see
> another pair of eyes reviewing the changes.

I guess the problem I described above and the code that tries to
construct the updated SCM_RIGHTS fd array, seems wrong, as far as I can
see it only tries to pass the tmp pipe read end, but not the actual fd
array passed from the caller.

I tried to get the SCM_RIGHTS passing working here:
https://gitlab.com/metze/socket_wrapper/-/tree/fd-passing-unix

Once I got this working I'll try to integrate this with the rest of your
patches.

>> An additional idea would be using temporary anonymous files (maybe
>> memfd_create() or an similation for it) for shared structures for
>> passed
>> sockets. In sendmsg() the sender would move the
>> socket_info[_container]
>> structures from malloc'ed memory to an temporary memory file.
>> This will replace the pipe fd of the original design.
>> If multiple fd's are passed, the memory file contains an array of
>> socket_info[_container] structures.
>> int *socket_fds_idx would be changed to an array of structures
>> or we have an additional array to store possible fd for the temporary
>> files and have the destructing code lock at it and select between
>> free() and munmap()/close().
>> This design would not require a named file, like the one specified
>> by SOCKET_WRAPPER_FD_PASSING_DB.
> 
> Hm.. this is new to me. I will think about it.

I guess it means we would need to undo some of the changes we made
to one array of socket_info structures.

I'll let you know if I get the basic passing of information via the tmp
pipe working...

Thanks!
metze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20200629/c8646da7/signature.sig>