[PATCH] tdb: runtime check for robust mutexes may hang

Stefan Metzmacher metze at samba.org
Sun Mar 12 14:18:55 UTC 2017


Hi Ralph,

> Attached is a fix for bug:
> 
> <https://bugzilla.samba.org/show_bug.cgi?id=12593>
> 
> Calling tdb_runtime_check_for_robust_mutexes() in multithreaded programs can
> hang in sigsuspend() due to lost SIGCHLD.
> 
> Example stack back-trace from the bugreport:
> 
>   Thread 5 (Thread 0x7f2ea5037700 (LWP 1054)):
>   #0  0x00007f2eb6325506 in sigsuspend () at /usr/lib/libc.so.6
>   #1  0x00007f2ead463802 in tdb_runtime_check_for_robust_mutexes () at /usr/lib/libtdb.so.1
>   #2  0x00007f2eade9b05d in tdb_wrap_open () at /usr/lib/samba/libtdb-wrap-samba4.so
>   #3  0x00007f2eb4e49b21 in  () at /usr/lib/libsmbconf.so.0
>   #4  0x00007f2eb4e4a305 in gencache_parse () at /usr/lib/libsmbconf.so.0
>   #5  0x00007f2eb4e4a862 in gencache_get_data_blob () at /usr/lib/libsmbconf.so.0
>   #6  0x00007f2eb4e4a90b in gencache_get () at /usr/lib/libsmbconf.so.0
>   #7  0x00007f2eb422f74a in sitename_fetch () at /usr/lib/samba/libgse-samba4.so
>   #8  0x00007f2eb422d7da in resolve_name () at /usr/lib/samba/libgse-samba4.so
>   #9  0x00007f2eb762a5e2 in  () at /usr/lib/libsmbclient.so.0
>   #10 0x0000000000406ebd in do_mount (...)
>   #11 0x00007f2eb7407f4a in g_vfs_job_run (job=0x12b72b0 [GVfsJobMount]) ...
>   #12 0x00007f2eb6920c9e in g_thread_pool_thread_proxy (data=<optimized out>) ...
>   #13 0x00007f2eb69202a5 in g_thread_proxy (data=0x7f2e98004720) at gthread.c:784
>   #14 0x00007f2eb6697444 in start_thread () at /usr/lib/libpthread.so.0
>   #15 0x00007f2eb63d9cff in clone () at /usr/lib/libc.so.6
> 
> As there's no POSIX function to change the signal mask of all threads in a
> process, this problem can't be solved by a change to
> tdb_runtime_check_for_robust_mutexes().
> 
> Attached patch *moves* the runtime check to a library initializer which
> guarantees that only one thread exists in the process when the function is run.
> 
> tdb_runtime_check_for_robust_mutexes() is marked as deprecated but not removed
> from the API in order to prevent a incompatible ABI change that requires a major
> version number bump (which could be done of course).
> 
> I'm adding a new function to the API that merely fetches the stored result of
> the runtime check done at library initialisation. All existing callers of
> tdb_runtime_check_for_robust_mutexes() are changed to call the new function.
> 
> This means we're essentially tying the support for robust mutexes to compiler
> support for library initializers via a contructor attribute.
> 
> Again:
> 
> this means we're essentially tying the support for robust mutexes to compiler
> support for library initializers via a contructor attribute.
> 
> sleep(60);
> 
> To the best of my knowledge, only Linux, FreeBSD and Solaris have support for
> robust mutexes and the following OS/compiler combinations have been successfully
> tested with this change:
> 
> o Linux with gcc and clang
> 
> o FreeBSD with gcc and clang
> 
> o Solaris with gcc
> 
> Having code still directly or indirectly calling
> tdb_runtime_check_for_robust_mutexes() just doesn't feel right as we know it's
> broken by design for multi-threaded programs and we're already using threads
> under the hoods in some places.
> 
> Thoughts? Please review and comment, but don't push yet, I believe this needs
> some time to sink into the minds of everyone interested.

I'd prefer to keep the patchset much simpler, I think we could
just call tdb_runtime_check_for_robust_mutexes() from
a library initializer. That's really all we need
tdb_runtime_check_for_robust_mutexes() is already a
run once function, which caches the result of it's first
run. The library initializers would most likely runs before
any thread is started.

If the platform supports robust mutexes it would most likely also
support library initializers. If library initializers are not supported,
we most likely also miss support for either threads or robust mutexes.

So just the simple fix to call tdb_runtime_check_for_robust_mutexes
from tdb_lib_init() should effectively catch all use cases, without
the need for any changes in existing callers.

Is there any real reason why we would need something more complex?

metze



More information about the samba-technical mailing list