tfork() test hanging in socket_wrapper lock
Ralph Böhme
slow at samba.org
Tue Nov 20 15:26:52 UTC 2018
On Mon, Nov 19, 2018 at 07:12:17PM +0100, Andreas Schneider via samba-technical wrote:
>On Monday, 19 November 2018 19:01:18 CET Andrew Bartlett wrote:
>> On Mon, 2018-11-19 at 11:54 +0100, Andreas Schneider wrote:
>> > On Monday, 19 November 2018 03:59:35 CET Andrew Bartlett wrote:
>> > > Today on sn-devel my autobuild hung running the tfork test:
>> > >
>> > > It appears to be waiting forever for a mutex lock.
>> >
>> > Anoop and myself tried to reproduce it, but we couldn't. However helgrind
>> > reports several errors in tfork itself.
>>
>> I should have mentioned, I've left them stuck in case they can be
>> debugged.
>
>I think Ralph already nailed it. It is an issue with SIGCHLD handling in
>tfork.
in fact it's not. :) Sorry for the red herring, but it took some time to swap
the tfork details back into main memory.
Taking a closer look at the data races reported by helgrind, I came to the
conclusion that they are all false positives.
The reported data races are in code that is globally serialized by
tfork_acquire_sighandling() and tfork_release_sighandling(), ensuring there's
always only one writer to the reported objects and all read accesses are
sequenced after the write because there are a bunch of syscalls in between which
serve as implicit memory barrier.
I've added valgrind annotations to tfork.c to suppress the more subtle false
positives. A few other false positives can't really be marked in the code, I've
therefor added suppressions files for helgrind and dbd.
metze, can you please take a close look at the attached (WIPish) patchset?
Thanks!
-slow
--
Ralph Boehme, Samba Team https://samba.org/
Samba Developer, SerNet GmbH https://sernet.de/en/samba/
GPG Key Fingerprint: FAE2 C608 8A24 2520 51C5
59E4 AA1E 9B71 2639 9E46
-------------- next part --------------
From 617f55649e441e1863f08a0bd0219e2ac4c18598 Mon Sep 17 00:00:00 2001
From: Ralph Boehme <slow at samba.org>
Date: Mon, 19 Nov 2018 16:47:33 +0100
Subject: [PATCH 1/5] tfork/test: ensure all threads start with SIGCHLD
unblocked
---
lib/util/tests/tfork.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/lib/util/tests/tfork.c b/lib/util/tests/tfork.c
index 3c73355b3f0..a74f7e8d7e2 100644
--- a/lib/util/tests/tfork.c
+++ b/lib/util/tests/tfork.c
@@ -470,12 +470,29 @@ static bool test_tfork_threads(struct torture_context *tctx)
bool ok = true;
const int num_threads = 64;
pthread_t threads[num_threads];
+ sigset_t set;
int i;
#ifndef HAVE_PTHREAD
torture_skip(tctx, "no pthread support\n");
#endif
+ /*
+ * Be nasty and taste for the worst case: ensure all threads start with
+ * SIGCHLD unblocked so we have the most fun with SIGCHLD being
+ * delivered to a random thread. :)
+ */
+ sigemptyset(&set);
+ sigaddset(&set, SIGCHLD);
+#ifdef HAVE_PTHREAD
+ ret = pthread_sigmask(SIG_UNBLOCK, &set, NULL);
+#else
+ ret = sigprocmask(SIG_UNBLOCK, &set, NULL);
+#endif
+ if (ret != 0) {
+ return -1;
+ }
+
for (i = 0; i < num_threads; i++) {
ret = pthread_create(&threads[i], NULL, tfork_thread, NULL);
torture_assert_goto(tctx, ret == 0, ok, done,
--
2.17.2
From 713bd933de69d28714bc2cc31c7002b9126524b9 Mon Sep 17 00:00:00 2001
From: Ralph Boehme <slow at samba.org>
Date: Mon, 19 Nov 2018 23:07:55 +0100
Subject: [PATCH 2/5] tfork: TFORK_ANNOTATE_BENIGN_RACE
---
lib/util/tfork.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/lib/util/tfork.c b/lib/util/tfork.c
index 8ed5811c536..4a5c08f7d79 100644
--- a/lib/util/tfork.c
+++ b/lib/util/tfork.c
@@ -58,6 +58,18 @@
* +----------+
*/
+#ifdef HAVE_VALGRIND_HELGRIND_H
+#include <valgrind/helgrind.h>
+#endif
+#ifndef ANNOTATE_BENIGN_RACE_SIZED
+#define ANNOTATE_BENIGN_RACE_SIZED(obj, size, description)
+#endif
+
+#define TFORK_ANNOTATE_BENIGN_RACE(obj) \
+ ANNOTATE_BENIGN_RACE_SIZED( \
+ (obj), sizeof(*(obj)), \
+ "no race, serialized by tfork_[un]install_sigchld_handler");
+
/*
* The resulting (private) state per tfork_create() call, returned as a opaque
* handle to the caller.
@@ -221,6 +233,14 @@ static void tfork_atfork_child(void)
ret = pthread_key_create(&tfork_global_key, tfork_global_destructor);
assert(ret == 0);
+ /*
+ * There's no data race on the cond variable from the signal state, we
+ * are writing here, but there are no readers yet. Some data race
+ * detection tools report a race, but the readers are in the parent
+ * process.
+ */
+ TFORK_ANNOTATE_BENIGN_RACE(&signal_state.cond);
+
/*
* There's no way to destroy a condition variable if there are waiters,
* pthread_cond_destroy() will return EBUSY. Just zero out memory and
@@ -266,6 +286,13 @@ static void tfork_global_initialize(void)
ret = pthread_cond_init(&signal_state.cond, NULL);
assert(ret == 0);
+
+ /*
+ * In a threaded process there's no data race on t->waiter_pid as
+ * we're serializing globally via tfork_acquire_sighandling() and
+ * tfork_release_sighandling().
+ */
+ TFORK_ANNOTATE_BENIGN_RACE(&signal_state.pid);
#endif
signal_state.available = true;
@@ -496,6 +523,13 @@ static pid_t tfork_start_waiter_and_worker(struct tfork_state *state,
if (pid != 0) {
/* The caller */
+ /*
+ * In a threaded process there's no data race on
+ * state->waiter_pid as we're serializing globally via
+ * tfork_acquire_sighandling() and tfork_release_sighandling().
+ */
+ TFORK_ANNOTATE_BENIGN_RACE(&state->waiter_pid);
+
state->waiter_pid = pid;
close(status_sp_waiter_fd);
@@ -757,6 +791,13 @@ struct tfork *tfork_create(void)
return t;
}
+ /*
+ * In a threaded process there's no data race on t->waiter_pid as
+ * we're serializing globally via tfork_acquire_sighandling() and
+ * tfork_release_sighandling().
+ */
+ TFORK_ANNOTATE_BENIGN_RACE(&t->waiter_pid);
+
t->waiter_pid = pid;
t->worker_pid = state->worker_pid;
--
2.17.2
From 080bee69c0b875a7d1bd77ee7b8ecd70b0f580cf Mon Sep 17 00:00:00 2001
From: Ralph Boehme <slow at samba.org>
Date: Mon, 19 Nov 2018 15:18:34 +0100
Subject: [PATCH 3/5] tfork: add a suppresssions file for helgrind
tfork_atexit_unknown[1|2]:
No idea what triggers this, definitely not tfork itself.
tfork_pthread_get_specific:
Helgrind reports:
Possible data race during read of size 4 at 0x5141304 by thread #3
Locks held: none
at 0x50E602E: tfork_global_get (tfork.c:301)
by 0x50E69B1: tfork_create (tfork.c:737)
by 0x2F7419: tfork_thread (tfork.c:431)
by 0x4C35AC5: mythread_wrapper (hg_intercepts.c:389)
by 0x8D38593: start_thread (pthread_create.c:463)
by 0x9A70E6E: clone (clone.S:95)
This conflicts with a previous write of size 4 by thread #2
Locks held: none
at 0x8D3F7B7: pthread_key_create (pthread_key_create.c:41)
by 0x50E5F79: tfork_global_initialize (tfork.c:280)
by 0x8D3FEA6: __pthread_once_slow (pthread_once.c:116)
by 0x50E6999: tfork_create (tfork.c:728)
by 0x2F7419: tfork_thread (tfork.c:431)
by 0x4C35AC5: mythread_wrapper (hg_intercepts.c:389)
by 0x8D38593: start_thread (pthread_create.c:463)
by 0x9A70E6E: clone (clone.S:95)
Location 0x5141304 is 0 bytes inside global var "tfork_global_key"
declared at tfork.c:122
This is nonsense, tfork_global_get() calls pthread_getspecific, so
we're looking at the pthread_key_create()/pthread_[g|s]etspecific()
API here which works with threads by design.
---
lib/util/tests/tfork-helgrind.supp | 32 ++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
create mode 100644 lib/util/tests/tfork-helgrind.supp
diff --git a/lib/util/tests/tfork-helgrind.supp b/lib/util/tests/tfork-helgrind.supp
new file mode 100644
index 00000000000..4b62b2a1a93
--- /dev/null
+++ b/lib/util/tests/tfork-helgrind.supp
@@ -0,0 +1,32 @@
+{
+ tfork_atexit_unknown1
+ Helgrind:Misc
+ fun:mutex_destroy_WRK
+ fun:pthread_mutex_destroy
+ obj:/usr/lib64/libp11-kit.so.0.3.0
+ fun:_dl_fini
+ fun:__run_exit_handlers
+ fun:exit
+ fun:(below main)
+}
+
+{
+ tfork_atexit_unknown2
+ Helgrind:Misc
+ fun:mutex_destroy_WRK
+ fun:pthread_mutex_destroy
+ fun:_dl_fini
+ fun:__run_exit_handlers
+ fun:exit
+ fun:(below main)
+}
+{
+ tfork_pthread_get_specific
+ Helgrind:Race
+ fun:tfork_global_get
+ fun:tfork_create
+ fun:tfork_thread
+ fun:mythread_wrapper
+ fun:start_thread
+ fun:clone
+}
--
2.17.2
From 8a2abc9cd4c240a0d7757e7ad89fe7ca27f60030 Mon Sep 17 00:00:00 2001
From: Ralph Boehme <slow at samba.org>
Date: Tue, 20 Nov 2018 16:03:03 +0100
Subject: [PATCH 4/5] tfork: add a suppresssions file for drd
drd reports:
initialized twice: cond 0x514f188
at 0x4C3A399: pthread_cond_init_intercept (drd_pthread_intercepts.c:1022)
by 0x4C3A399: pthread_cond_init@* (drd_pthread_intercepts.c:1030)
by 0x50F3FF3: tfork_atfork_child (tfork.c:250)
by 0x9A4B95D: fork (fork.c:204)
by 0x50F4834: tfork_start_waiter_and_worker (tfork.c:581)
by 0x50F4CDB: tfork_create (tfork.c:780)
by 0x2F7469: tfork_thread (tfork.c:431)
by 0x4C358F8: vgDrd_thread_wrapper (drd_pthread_intercepts.c:444)
by 0x8D46593: start_thread (pthread_create.c:463)
by 0x9A7EE6E: clone (clone.S:95)
cond 0x514f188 was first observed at:
at 0x4C3A399: pthread_cond_init_intercept (drd_pthread_intercepts.c:1022)
by 0x4C3A399: pthread_cond_init@* (drd_pthread_intercepts.c:1030)
by 0x50F413A: tfork_global_initialize (tfork.c:287)
by 0x8D4DEA6: __pthread_once_slow (pthread_once.c:116)
by 0x4C377FD: pthread_once_intercept (drd_pthread_intercepts.c:800)
by 0x4C377FD: pthread_once (drd_pthread_intercepts.c:806)
by 0x50F4C0E: tfork_create (tfork.c:743)
by 0x2F7469: tfork_thread (tfork.c:431)
by 0x4C358F8: vgDrd_thread_wrapper (drd_pthread_intercepts.c:444)
by 0x8D46593: start_thread (pthread_create.c:463)
by 0x9A7EE6E: clone (clone.S:95)
This is intentional, the reinit is in a child process. Cf the comment in
tfork.c.
---
lib/util/tests/tfork-drd.supp | 14 ++++++++++++++
1 file changed, 14 insertions(+)
create mode 100644 lib/util/tests/tfork-drd.supp
diff --git a/lib/util/tests/tfork-drd.supp b/lib/util/tests/tfork-drd.supp
new file mode 100644
index 00000000000..7d0544b6b43
--- /dev/null
+++ b/lib/util/tests/tfork-drd.supp
@@ -0,0 +1,14 @@
+{
+ tfork_pthread_cond_init
+ drd:CondErr
+ fun:pthread_cond_init_intercept
+ fun:pthread_cond_init@*
+ fun:tfork_atfork_child
+ fun:fork
+ fun:tfork_start_waiter_and_worker
+ fun:tfork_create
+ fun:tfork_thread
+ fun:vgDrd_thread_wrapper
+ fun:start_thread
+ fun:clone
+}
--
2.17.2
From a3c2eac7637a5275069952e2750a058df045e93e Mon Sep 17 00:00:00 2001
From: Ralph Boehme <slow at samba.org>
Date: Tue, 20 Nov 2018 15:50:52 +0100
Subject: [PATCH 5/5] tfork: add a README how to run test torture test under
valgrind
---
lib/util/tests/README | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
create mode 100644 lib/util/tests/README
diff --git a/lib/util/tests/README b/lib/util/tests/README
new file mode 100644
index 00000000000..c1337d5cf58
--- /dev/null
+++ b/lib/util/tests/README
@@ -0,0 +1,22 @@
+tfork tests
+===========
+
+To run the tfork torture testsuite under valgrind with the helgrind or drd
+thread checkers, run valgrind with the --suppress option passing a suppressions
+file.
+
+For helgrind:
+
+$ valgrind \
+ --trace-children=yes \
+ --tool=helgrind \
+ --suppressions=lib/util/tests/tfork-helgrind.supp \
+ ./bin/smbtorture ncalrpc:localhost local.tfork.tfork_threads
+
+For drd:
+
+$ valgrind \
+ --trace-children=yes \
+ --tool=drd \
+ --suppressions=lib/util/tests/tfork-drd.supp \
+ ./bin/smbtorture ncalrpc:localhost local.tfork.tfork_threads
--
2.17.2
More information about the samba-technical
mailing list