[PATCHES][BUG 13170] Fix crash in pthreadpool thread after failure from pthread_create

Christof Schmitt cs at samba.org
Thu Nov 30 19:14:35 UTC 2017


On Thu, Nov 30, 2017 at 01:40:00PM +0100, Volker Lendecke wrote:
> On Wed, Nov 29, 2017 at 12:23:34PM -0700, Christof Schmitt via samba-technical wrote:
> > As discussed in a side-thread, my initial patch lets the error surface
> > back to the client. Here is an updated patch that avoids this issue. If
> > no thread can be created, but another still exists, let that handle the
> > request. If no thread exists, then fallback to processing the job
> > synchronously, as the system might be hitting a resource limit.
> 
> From a review perspective, this looks good. For the refactoring one of
> course RB+. However, pthreadpool being such a fragile piece of code I
> would love to see unit tests for this.  This will require explicit
> error injects with some DEVELOPER only code, but is it possible to get
> this? If you don't have time, I might try to code something up.

Yes, see attached patch. I am not sure if that is the best solution, but
it is easy enough. A more complicate approach would be LD_PRELOAD to
overwrite the libc functions.

Note that i also fixed the num_threads += 1 in the first patch to only
run when pthread_create was successful.

> On top of your refactoring patch, attached find another pthreadpool
> race condition fix. For me it does fix something, but it also
> introduces another piece of technical debt. I have to put some more
> thought into handling pthread_create failure in the parent atfork
> handler. This might require interaction with the users of this code,
> don't know yet.
> 
> It's running a private autobuild now, running pthreadpooltest manually
> gives the expected result: Without the fix it fails, with the fix it
> succeeds.
> 
> Comments?


> From 163df5a06d37eaab665f94a29f22b11bbc17bab6 Mon Sep 17 00:00:00 2001
> From: Volker Lendecke <vl at samba.org>
> Date: Wed, 29 Nov 2017 16:45:40 +0100
> Subject: [PATCH 1/2] pthreadpool: Fix a race condition
> 
> The comment says it all
> 
> Signed-off-by: Volker Lendecke <vl at samba.org>
> ---
>  lib/pthreadpool/pthreadpool.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/lib/pthreadpool/pthreadpool.c b/lib/pthreadpool/pthreadpool.c
> index 874ddd23c1d..abf0e8f6f61 100644
> --- a/lib/pthreadpool/pthreadpool.c
> +++ b/lib/pthreadpool/pthreadpool.c
> @@ -103,6 +103,7 @@ static struct pthreadpool *pthreadpools = NULL;
>  static pthread_once_t pthreadpool_atfork_initialized = PTHREAD_ONCE_INIT;
>  
>  static void pthreadpool_prep_atfork(void);
> +static int pthreadpool_create_thread(struct pthreadpool *pool);
>  
>  /*
>   * Initialize a thread pool
> @@ -244,6 +245,27 @@ static void pthreadpool_parent(void)
>  	     pool = DLIST_PREV(pool)) {
>  		ret = pthread_cond_init(&pool->condvar, NULL);
>  		assert(ret == 0);
> +
> +		if ((pool->num_jobs != 0) && (pool->num_threads == 0)) {
> +			/*
> +			 * We added a job with idle threads and
> +			 * immediately did a fork before any of the
> +			 * idle threads got scheduled.
> +			 *
> +			 * pthreadpool_prepare killed all helper
> +			 * threads to get them off pool->condvar, so
> +			 * we need to create at least one to get
> +			 * things going again.
> +			 */
> +			ret = pthreadpool_create_thread(pool);
> +
> +			/*
> +			 * TODO: Handle temporary thread creation
> +			 * failure gracefully.
> +			 */
> +			assert(ret == 0);

Would it make sense to also have a fallback to calling the jobs
synchronously here? I am not sure what else could be done; the system
might be hitting a limit with the number of threads when pthread_create
fails and we should still try to finish the jobs.

Christof
-------------- next part --------------
From 3addce5db0504bc9850fc29b8fa77a327ea20a70 Mon Sep 17 00:00:00 2001
From: Christof Schmitt <cs at samba.org>
Date: Tue, 28 Nov 2017 10:49:36 -0700
Subject: [PATCH 1/3] pthreadpool: Move creating of thread to new function

No functional change, but this simplifies error handling.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13170

Signed-off-by: Christof Schmitt <cs at samba.org>
---
 lib/pthreadpool/pthreadpool.c | 79 ++++++++++++++++++++++++-------------------
 1 file changed, 45 insertions(+), 34 deletions(-)

diff --git a/lib/pthreadpool/pthreadpool.c b/lib/pthreadpool/pthreadpool.c
index 23885aa..4e1e5d4 100644
--- a/lib/pthreadpool/pthreadpool.c
+++ b/lib/pthreadpool/pthreadpool.c
@@ -521,14 +521,56 @@ static void *pthreadpool_server(void *arg)
 	}
 }
 
-int pthreadpool_add_job(struct pthreadpool *pool, int job_id,
-			void (*fn)(void *private_data), void *private_data)
+static int pthreadpool_create_thread(struct pthreadpool *pool)
 {
 	pthread_attr_t thread_attr;
 	pthread_t thread_id;
 	int res;
 	sigset_t mask, omask;
 
+	/*
+	 * Create a new worker thread. It should not receive any signals.
+	 */
+
+	sigfillset(&mask);
+
+	res = pthread_attr_init(&thread_attr);
+	if (res != 0) {
+		return res;
+	}
+
+	res = pthread_attr_setdetachstate(
+		&thread_attr, PTHREAD_CREATE_DETACHED);
+	if (res != 0) {
+		pthread_attr_destroy(&thread_attr);
+		return res;
+	}
+
+	res = pthread_sigmask(SIG_BLOCK, &mask, &omask);
+	if (res != 0) {
+		pthread_attr_destroy(&thread_attr);
+		return res;
+	}
+
+	res = pthread_create(&thread_id, &thread_attr, pthreadpool_server,
+			     (void *)pool);
+
+	assert(pthread_sigmask(SIG_SETMASK, &omask, NULL) == 0);
+
+	pthread_attr_destroy(&thread_attr);
+
+	if (res == 0) {
+		pool->num_threads += 1;
+	}
+
+	return res;
+}
+
+int pthreadpool_add_job(struct pthreadpool *pool, int job_id,
+			void (*fn)(void *private_data), void *private_data)
+{
+	int res;
+
 	res = pthread_mutex_lock(&pool->mutex);
 	if (res != 0) {
 		return res;
@@ -570,43 +612,12 @@ int pthreadpool_add_job(struct pthreadpool *pool, int job_id,
 		return 0;
 	}
 
-	/*
-	 * Create a new worker thread. It should not receive any signals.
-	 */
-
-	sigfillset(&mask);
-
-	res = pthread_attr_init(&thread_attr);
-	if (res != 0) {
-		pthread_mutex_unlock(&pool->mutex);
-		return res;
-	}
-
-	res = pthread_attr_setdetachstate(
-		&thread_attr, PTHREAD_CREATE_DETACHED);
+	res = pthreadpool_create_thread(pool);
 	if (res != 0) {
-		pthread_attr_destroy(&thread_attr);
-		pthread_mutex_unlock(&pool->mutex);
-		return res;
-	}
-
-        res = pthread_sigmask(SIG_BLOCK, &mask, &omask);
-	if (res != 0) {
-		pthread_attr_destroy(&thread_attr);
 		pthread_mutex_unlock(&pool->mutex);
 		return res;
 	}
 
-	res = pthread_create(&thread_id, &thread_attr, pthreadpool_server,
-			     (void *)pool);
-	if (res == 0) {
-		pool->num_threads += 1;
-	}
-
-        assert(pthread_sigmask(SIG_SETMASK, &omask, NULL) == 0);
-
-	pthread_attr_destroy(&thread_attr);
-
 	pthread_mutex_unlock(&pool->mutex);
 	return res;
 }
-- 
1.8.3.1


From 927d6f69a1177d8ade815b0256370a140b79d695 Mon Sep 17 00:00:00 2001
From: Christof Schmitt <cs at samba.org>
Date: Tue, 28 Nov 2017 10:59:06 -0700
Subject: [PATCH 2/3] pthreadpool: Undo put_job when returning error

When an error is returned to the caller of pthreadpool_add_job, the job
should not be kept in the internal job array. Otherwise the caller might
free the data structure and a later worker thread would still reference
it.

When it is not possible to create a single worker thread, the system
might be out of resources or hitting a configured limit. In this case
fall back to calling the job function synchronously instead of raising
the error to the caller and possibly back to the SMB client.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13170

Signed-off-by: Christof Schmitt <cs at samba.org>
---
 lib/pthreadpool/pthreadpool.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/lib/pthreadpool/pthreadpool.c b/lib/pthreadpool/pthreadpool.c
index 4e1e5d4..309aba9 100644
--- a/lib/pthreadpool/pthreadpool.c
+++ b/lib/pthreadpool/pthreadpool.c
@@ -429,6 +429,11 @@ static bool pthreadpool_put_job(struct pthreadpool *p,
 	return true;
 }
 
+static void pthreadpool_undo_put_job(struct pthreadpool *p)
+{
+	p->num_jobs -= 1;
+}
+
 static void *pthreadpool_server(void *arg)
 {
 	struct pthreadpool *pool = (struct pthreadpool *)arg;
@@ -599,6 +604,9 @@ int pthreadpool_add_job(struct pthreadpool *pool, int job_id,
 		 * We have idle threads, wake one.
 		 */
 		res = pthread_cond_signal(&pool->condvar);
+		if (res != 0) {
+			pthreadpool_undo_put_job(pool);
+		}
 		pthread_mutex_unlock(&pool->mutex);
 		return res;
 	}
@@ -614,8 +622,24 @@ int pthreadpool_add_job(struct pthreadpool *pool, int job_id,
 
 	res = pthreadpool_create_thread(pool);
 	if (res != 0) {
-		pthread_mutex_unlock(&pool->mutex);
-		return res;
+		if (pool->num_threads == 0) {
+			/*
+			 * No thread could be created to run job,
+			 * fallback to sync call.
+			 */
+			pthreadpool_undo_put_job(pool);
+			pthread_mutex_unlock(&pool->mutex);
+
+			fn(private_data);
+			return pool->signal_fn(job_id, fn, private_data,
+					       pool->signal_fn_private_data);
+		}
+
+		/*
+		 * At least one thread is still available, let
+		 * that one run the queued job.
+		 */
+		res = 0;
 	}
 
 	pthread_mutex_unlock(&pool->mutex);
-- 
1.8.3.1


From e01ece3724ed8794c742da7ee5817d3c090d6c13 Mon Sep 17 00:00:00 2001
From: Christof Schmitt <cs at samba.org>
Date: Thu, 30 Nov 2017 11:42:03 -0700
Subject: [PATCH 3/3] pthreadpool: Add test for handling pthread_create failure

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13170

Signed-off-by: Christof Schmitt <cs at samba.org>
---
 lib/pthreadpool/pthreadpool.c |  10 +++
 lib/pthreadpool/tests.c       | 157 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 167 insertions(+)

diff --git a/lib/pthreadpool/pthreadpool.c b/lib/pthreadpool/pthreadpool.c
index 309aba9..d8b20be 100644
--- a/lib/pthreadpool/pthreadpool.c
+++ b/lib/pthreadpool/pthreadpool.c
@@ -526,6 +526,10 @@ static void *pthreadpool_server(void *arg)
 	}
 }
 
+#ifdef DEVELOPER
+bool inject_pthread_create_failure;
+#endif
+
 static int pthreadpool_create_thread(struct pthreadpool *pool)
 {
 	pthread_attr_t thread_attr;
@@ -557,6 +561,12 @@ static int pthreadpool_create_thread(struct pthreadpool *pool)
 		return res;
 	}
 
+#ifdef DEVELOPER
+	if (inject_pthread_create_failure) {
+		pthread_attr_destroy(&thread_attr);
+		return EAGAIN;
+	}
+#endif
 	res = pthread_create(&thread_id, &thread_attr, pthreadpool_server,
 			     (void *)pool);
 
diff --git a/lib/pthreadpool/tests.c b/lib/pthreadpool/tests.c
index 1aab80c..28a6ff6 100644
--- a/lib/pthreadpool/tests.c
+++ b/lib/pthreadpool/tests.c
@@ -10,6 +10,7 @@
 #include <signal.h>
 #include "pthreadpool_pipe.h"
 #include "pthreadpool_tevent.h"
+#include "config.h"
 
 static int test_init(void)
 {
@@ -382,6 +383,156 @@ static int test_tevent_1(void)
 	return 0;
 }
 
+/*
+ * Test requires actual pthreadpool, not pthreadpool_sync. It also
+ * needs the error inject that is only available in developer mode.
+ */
+#if defined WITH_PTHREADPOOL && defined DEVELOPER
+
+static void test_job_threadid(void *ptr)
+{
+	pthread_t *threadid = ptr;
+
+	*threadid = pthread_self();
+}
+
+static int test_thread(struct tevent_context *ev,
+		       struct pthreadpool_tevent *pool,
+		       bool *in_main_thread)
+{
+	struct tevent_req *req;
+	pthread_t main_thread, worker_thread;
+	bool ok;
+	int ret;
+
+	main_thread = pthread_self();
+
+	req = pthreadpool_tevent_job_send(
+		ev, ev, pool, test_job_threadid, &worker_thread);
+	if (req == NULL) {
+		fprintf(stderr, "pthreadpool_tevent_job_send failed\n");
+		TALLOC_FREE(ev);
+		return ENOMEM;
+	}
+	ok = tevent_req_poll(req, ev);
+	if (!ok) {
+		ret = errno;
+		fprintf(stderr, "tevent_req_poll failed: %s\n",
+			strerror(ret));
+		TALLOC_FREE(ev);
+		return ret;
+	}
+	ret = pthreadpool_tevent_job_recv(req);
+	TALLOC_FREE(req);
+	if (ret != 0) {
+		fprintf(stderr, "tevent_req_recv failed: %s\n",
+			strerror(ret));
+		TALLOC_FREE(ev);
+		return ret;
+	}
+
+	*in_main_thread = pthread_equal(worker_thread, main_thread);
+
+	return 0;
+}
+
+extern bool inject_pthread_create_failure;
+
+static int test_create(void)
+{
+	struct tevent_context *ev;
+	struct pthreadpool_tevent *pool;
+	int ret;
+	bool in_main_thread;
+
+	ev = tevent_context_init(NULL);
+	if (ev == NULL) {
+		ret = errno;
+		fprintf(stderr, "tevent_context_init failed: %s\n",
+			strerror(ret));
+		return ret;
+	}
+	ret = pthreadpool_tevent_init(ev, 0, &pool);
+	if (ret != 0) {
+		fprintf(stderr, "pthreadpool_tevent_init failed: %s\n",
+			strerror(ret));
+		TALLOC_FREE(ev);
+		return ret;
+	}
+
+	/*
+	 * pthreadpool cannot create first worker thread, so this job
+	 * will run in the sync fallback in the main thread.
+	 */
+	inject_pthread_create_failure = true;
+	ret = test_thread(ev, pool, &in_main_thread);
+
+	if (ret != 0) {
+		fprintf(stderr, "First job failed.\n");
+		goto out;
+	}
+
+	if (!in_main_thread) {
+		fprintf(stderr, "First job did not run in main thread.\n");
+		ret = -1;
+		goto out;
+	}
+
+	/*
+	 * Now a Worker thread will be created.
+	 */
+	inject_pthread_create_failure = false;
+	ret = test_thread(ev, pool, &in_main_thread);
+
+	if (ret != 0) {
+		fprintf(stderr, "Second job failed.\n");
+		goto out;
+	}
+
+	if (in_main_thread) {
+		fprintf(stderr, "Second job did not run in worker thread.\n");
+		ret = -1;
+		goto out;
+	}
+
+	/*
+	 * Workerthread will still be active for a second, immediately
+	 * running another job will also use the worker thread.
+	 */
+	inject_pthread_create_failure = true;
+	ret = test_thread(ev, pool, &in_main_thread);
+
+	if (ret != 0) {
+		fprintf(stderr, "Third job failed.\n");
+		goto out;
+	}
+
+	if (in_main_thread) {
+		fprintf(stderr, "Third job did not run in worker thread.\n");
+		ret = -1;
+		goto out;
+	}
+
+	ret = tevent_loop_wait(ev);
+	if (ret != 0) {
+		fprintf(stderr, "tevent_loop_wait failed\n");
+		return ret;
+	}
+
+out:
+	TALLOC_FREE(pool);
+	TALLOC_FREE(ev);
+	return 0;
+}
+
+#else
+
+static int test_create(void)
+{
+	return 0;
+}
+#endif
+
 int main(void)
 {
 	int ret;
@@ -423,6 +574,12 @@ int main(void)
 		return 1;
 	}
 
+	ret = test_create();
+	if (ret != 0) {
+		fprintf(stderr, "test_create failed\n");
+		return 1;
+	}
+
 	printf("success\n");
 	return 0;
 }
-- 
1.8.3.1



More information about the samba-technical mailing list