Segfault in smbd: messaging_dgm_fde_active

Jeremy Allison jra at samba.org
Tue Jul 10 18:26:59 UTC 2018


On Tue, Jul 10, 2018 at 05:01:36PM +0200, Stefan Metzmacher wrote:
> Hi,
> 
> here's an updated patchset including a test that also demonstrates the
> problem.
> 
> I've also included two fixes to avoid two other flakey tests.
> 
> See also https://gitlab.com/samba-team/devel/samba/pipelines/25487638
> 
> Please review and push:-)


LGTM. RB+ and pushed !

Jeremy.

> Am 10.07.2018 um 14:39 schrieb Stefan Metzmacher via samba-technical:
> > Hi Jeremy,
> > 
> >>> OK, I'm reviewing this now, and I think I understand
> >>> it, but once I'm done I might ask you to add some
> >>> clarification to the commit message (or not, depending
> >>> on how easy this is to understand :-).
> >>
> >> OK, I do think I understand it, and I think the patch hunk
> >> in the destructor isn't quite right.
> >>
> >> Shoudn't it look like:
> >>
> >> @@ -150,6 +150,11 @@ static int msg_dgm_ref_destructor(struct msg_dgm_ref *r)
> >>         if (refs == NULL) {
> >>                 abort();
> >>         }
> >> +
> >> +       if (r == next_ref) {
> >> +               next_ref = r->next;
> >> +       }
> >> +
> >>         DLIST_REMOVE(refs, r);
> >>  
> >>         TALLOC_FREE(r->fde);
> >>
> >> instead of:
> >>
> >> @ -152,6 +153,10 @@ static int msg_dgm_ref_destructor(struct msg_dgm_ref *r)
> >>         }
> >>         DLIST_REMOVE(refs, r);
> >>  
> >> +       if (r == next_ref) {
> >> +               next_ref = NULL;
> >> +       }
> >> +
> >>         TALLOC_FREE(r->fde);
> >>  
> >>         DBG_DEBUG("refs=%p\n", refs);
> >>
> >> i.e. Rather than setting arbitrarily to NULL, make next_ref
> >> point to the 'r->next' pointer of the element currently being
> >> removed *before* we remove it. That way we don't prematurely
> >> terminate the list walk in msg_dgm_ref_recv().
> >>
> >> If I'm wrong, can you explain why (because then I
> >> didn't understand it :-) ?
> > 
> > Your are right! Thanks for finding that!
> > 
> > metze
> > 
> > 
> 

> From 45db46607b63248310eb2ad3dd4a0ea783042202 Mon Sep 17 00:00:00 2001
> From: Ralph Boehme <slow at samba.org>
> Date: Tue, 19 Jun 2018 10:35:04 +0200
> Subject: [PATCH 1/4] ctdb: close the correct pipe fd in a test
> 
> This was discovered in an autobuild with a patched tevent that used the
> "poll" backend by default. Test failure:
> 
> $ bin/sock_daemon_test /dev/shm/sock_daemon_test.pid /dev/shm/sock_daemon_test.sock 5
> test5[28011]: daemon started, pid=28011
> test5[28011]: listening on /dev/shm/sock_daemon_test.sock
> sock_daemon_test: ../ctdb/tests/src/sock_daemon_test.c:980: test5: Assertion `ret == i+1' failed.
> Abgebrochen (Speicherabzug geschrieben)
> metze at SERNOX14:~/devel/samba/4.0/master4-test$ test5[28011]: PID 28010 gone away, exiting
> test5[28011]: Shutting down
> sock_daemon_test: ../ctdb/tests/src/sock_daemon_test.c:964: test5:
> Assertion `ret == EINTR' failed.
> 
> After an epic debugging session we spotted the problem.
> 
> Signed-off-by: Ralph Boehme <slow at samba.org>
> Reviewed-by: Stefan Metzmacher <metze at samba.org>
> ---
>  ctdb/tests/src/sock_daemon_test.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ctdb/tests/src/sock_daemon_test.c b/ctdb/tests/src/sock_daemon_test.c
> index 916ba2236f03..80004b41c917 100644
> --- a/ctdb/tests/src/sock_daemon_test.c
> +++ b/ctdb/tests/src/sock_daemon_test.c
> @@ -759,7 +759,7 @@ static int test5_client(const char *sockpath, int id, pid_t pid_server,
>  			tevent_loop_once(ev);
>  		}
>  
> -		close(fd[0]);
> +		close(fd[1]);
>  		state.fd = -1;
>  
>  		while (kill(pid_server, 0) == 0 || errno != ESRCH) {
> -- 
> 2.17.1
> 
> 
> From c3da01eb6b212e5c891c2d11a41333df5ab0b997 Mon Sep 17 00:00:00 2001
> From: Stefan Metzmacher <metze at samba.org>
> Date: Thu, 21 Jun 2018 06:31:03 +0200
> Subject: [PATCH 2/4] python/tests: make the test_assoc_group_fail2() test more
>  resilient against timing
> 
> On a busy system [e]poll() on the server will mark both the
> old connection fd and also the listening fd as readable.
> 
> epoll() returns the events in order, so the server processes the
> disconnect first.
> 
> With poll() we don't have an order of the events and the
> server is likely to process the connect before the disconnect.
> 
> Signed-off-by: Stefan Metzmacher <metze at samba.org>
> Reviewed-by: Ralph Boehme <slow at samba.org>
> ---
>  python/samba/tests/dcerpc/raw_protocol.py | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/python/samba/tests/dcerpc/raw_protocol.py b/python/samba/tests/dcerpc/raw_protocol.py
> index ff815e97ca63..7cc3a4a29b52 100755
> --- a/python/samba/tests/dcerpc/raw_protocol.py
> +++ b/python/samba/tests/dcerpc/raw_protocol.py
> @@ -18,6 +18,7 @@
>  
>  import sys
>  import os
> +import time
>  
>  sys.path.insert(0, "bin/python")
>  os.environ["PYTHONUNBUFFERED"] = "1"
> @@ -4930,6 +4931,8 @@ class TestDCERPC_BIND(RawDCERPCTest):
>          ack = self.do_generic_bind(ctx=ctx)
>  
>          self._disconnect("test_assoc_group_fail2")
> +        self.assertNotConnected()
> +        time.sleep(0.5)
>          self.connect()
>  
>          ack2 = self.do_generic_bind(ctx=ctx,assoc_group_id=ack.u.assoc_group_id,
> -- 
> 2.17.1
> 
> 
> From 6310e77342d7bab6856aef1c343a060973a73c7b Mon Sep 17 00:00:00 2001
> From: Stefan Metzmacher <metze at samba.org>
> Date: Tue, 10 Jul 2018 16:21:55 +0200
> Subject: [PATCH 3/4] s4:messaging: add local.messaging.multi_ctx.multi_ctx
>  test
> 
> This tests the usage of multiple imessaging_contexts in one process
> and also freeing two of them during a message handler.
> 
> Bug: https://bugzilla.samba.org/show_bug.cgi?id=13514
> 
> Signed-off-by: Stefan Metzmacher <metze at samba.org>
> ---
>  selftest/knownfail.d/imessaging         |   1 +
>  source4/lib/messaging/tests/messaging.c | 221 ++++++++++++++++++++++++
>  2 files changed, 222 insertions(+)
>  create mode 100644 selftest/knownfail.d/imessaging
> 
> diff --git a/selftest/knownfail.d/imessaging b/selftest/knownfail.d/imessaging
> new file mode 100644
> index 000000000000..af8fa8d615a6
> --- /dev/null
> +++ b/selftest/knownfail.d/imessaging
> @@ -0,0 +1 @@
> +^samba4.local.messaging.multi_ctx
> diff --git a/source4/lib/messaging/tests/messaging.c b/source4/lib/messaging/tests/messaging.c
> index ba5897882880..80c8583c21eb 100644
> --- a/source4/lib/messaging/tests/messaging.c
> +++ b/source4/lib/messaging/tests/messaging.c
> @@ -393,6 +393,226 @@ static bool test_messaging_overflow_check(struct torture_context *tctx)
>  	return true;
>  }
>  
> +struct test_multi_ctx {
> +	struct torture_context *tctx;
> +	struct imessaging_context *server_ctx;
> +	struct imessaging_context *client_ctx[4];
> +	size_t num_missing;
> +	bool got_server;
> +	bool got_client_0_1;
> +	bool got_client_2_3;
> +	bool ok;
> +};
> +
> +static void multi_ctx_server_handler(struct imessaging_context *msg,
> +				     void *private_data,
> +				     uint32_t msg_type,
> +				     struct server_id server_id,
> +				     DATA_BLOB *data)
> +{
> +	struct test_multi_ctx *state = private_data;
> +	char *str = NULL;
> +
> +	torture_assert_goto(state->tctx, state->num_missing >= 1,
> +			    state->ok, fail,
> +			    "num_missing should be at least 1.");
> +	state->num_missing -= 1;
> +
> +	torture_assert_goto(state->tctx, !state->got_server,
> +			    state->ok, fail,
> +			    "already got server.");
> +	state->got_server = true;
> +
> +	/*
> +	 * We free the context itself and most likely reuse
> +	 * the memory immediately.
> +	 */
> +	TALLOC_FREE(state->server_ctx);
> +	str = generate_random_str(state->tctx, 128);
> +	torture_assert_goto(state->tctx, str != NULL,
> +			    state->ok, fail,
> +			    "generate_random_str()");
> +
> +fail:
> +	return;
> +}
> +
> +static void multi_ctx_client_0_1_handler(struct imessaging_context *msg,
> +					 void *private_data,
> +					 uint32_t msg_type,
> +					 struct server_id server_id,
> +					 DATA_BLOB *data)
> +{
> +	struct test_multi_ctx *state = private_data;
> +	char *str = NULL;
> +
> +	torture_assert_goto(state->tctx, state->num_missing >= 2,
> +			    state->ok, fail,
> +			    "num_missing should be at least 2.");
> +	state->num_missing -= 2;
> +
> +	torture_assert_goto(state->tctx, !state->got_client_0_1,
> +			    state->ok, fail,
> +			    "already got client_0_1.");
> +	state->got_client_0_1 = true;
> +
> +	/*
> +	 * We free two contexts and most likely reuse
> +	 * the memory immediately.
> +	 */
> +	TALLOC_FREE(state->client_ctx[0]);
> +	str = generate_random_str(state->tctx, 128);
> +	torture_assert_goto(state->tctx, str != NULL,
> +			    state->ok, fail,
> +			    "generate_random_str()");
> +	TALLOC_FREE(state->client_ctx[1]);
> +	str = generate_random_str(state->tctx, 128);
> +	torture_assert_goto(state->tctx, str != NULL,
> +			    state->ok, fail,
> +			    "generate_random_str()");
> +
> +fail:
> +	return;
> +}
> +
> +static void multi_ctx_client_2_3_handler(struct imessaging_context *msg,
> +					 void *private_data,
> +					 uint32_t msg_type,
> +					 struct server_id server_id,
> +					 DATA_BLOB *data)
> +{
> +	struct test_multi_ctx *state = private_data;
> +	char *str = NULL;
> +
> +	torture_assert_goto(state->tctx, state->num_missing >= 2,
> +			    state->ok, fail,
> +			    "num_missing should be at least 2.");
> +	state->num_missing -= 2;
> +
> +	torture_assert_goto(state->tctx, !state->got_client_2_3,
> +			    state->ok, fail,
> +			    "already got client_2_3.");
> +	state->got_client_2_3 = true;
> +
> +	/*
> +	 * We free two contexts and most likely reuse
> +	 * the memory immediately.
> +	 */
> +	TALLOC_FREE(state->client_ctx[2]);
> +	str = generate_random_str(state->tctx, 128);
> +	torture_assert_goto(state->tctx, str != NULL,
> +			    state->ok, fail,
> +			    "generate_random_str()");
> +	TALLOC_FREE(state->client_ctx[3]);
> +	str = generate_random_str(state->tctx, 128);
> +	torture_assert_goto(state->tctx, str != NULL,
> +			    state->ok, fail,
> +			    "generate_random_str()");
> +
> +fail:
> +	return;
> +}
> +
> +static bool test_multi_ctx(struct torture_context *tctx)
> +{
> +	struct test_multi_ctx state = {
> +		.tctx = tctx,
> +		.ok = true,
> +	};
> +	struct timeval tv;
> +	NTSTATUS status;
> +
> +	lpcfg_set_cmdline(tctx->lp_ctx, "pid directory", "piddir.tmp");
> +
> +	/*
> +	 * We use cluster_id(0, 0) as that gets for
> +	 * all task ids.
> +	 */
> +	state.server_ctx = imessaging_init(tctx,
> +					   tctx->lp_ctx,
> +					   cluster_id(0, 0),
> +					   tctx->ev);
> +	torture_assert(tctx, state.server_ctx != NULL,
> +		       "Failed to init messaging context");
> +
> +	status = imessaging_register(state.server_ctx, &state,
> +				     MSG_TMP_BASE-1,
> +				     multi_ctx_server_handler);
> +	torture_assert(tctx, NT_STATUS_IS_OK(status), "imessaging_register failed");
> +
> +	state.client_ctx[0] = imessaging_init(tctx,
> +					      tctx->lp_ctx,
> +					      cluster_id(0, 0),
> +					      tctx->ev);
> +	torture_assert(tctx, state.client_ctx[0] != NULL,
> +		       "msg_client_ctx imessaging_init() failed");
> +	status = imessaging_register(state.client_ctx[0], &state,
> +				     MSG_TMP_BASE-1,
> +				     multi_ctx_client_0_1_handler);
> +	torture_assert(tctx, NT_STATUS_IS_OK(status), "imessaging_register failed");
> +	state.client_ctx[1] = imessaging_init(tctx,
> +					      tctx->lp_ctx,
> +					      cluster_id(0, 0),
> +					      tctx->ev);
> +	torture_assert(tctx, state.client_ctx[1] != NULL,
> +		       "msg_client_ctx imessaging_init() failed");
> +	status = imessaging_register(state.client_ctx[1], &state,
> +				     MSG_TMP_BASE-1,
> +				     multi_ctx_client_0_1_handler);
> +	torture_assert(tctx, NT_STATUS_IS_OK(status), "imessaging_register failed");
> +	state.client_ctx[2] = imessaging_init(tctx,
> +					      tctx->lp_ctx,
> +					      cluster_id(0, 0),
> +					      tctx->ev);
> +	torture_assert(tctx, state.client_ctx[2] != NULL,
> +		       "msg_client_ctx imessaging_init() failed");
> +	status = imessaging_register(state.client_ctx[2], &state,
> +				     MSG_TMP_BASE-1,
> +				     multi_ctx_client_2_3_handler);
> +	torture_assert(tctx, NT_STATUS_IS_OK(status), "imessaging_register failed");
> +	state.client_ctx[3] = imessaging_init(tctx,
> +					      tctx->lp_ctx,
> +					      cluster_id(0, 0),
> +					      tctx->ev);
> +	torture_assert(tctx, state.client_ctx[3] != NULL,
> +		       "msg_client_ctx imessaging_init() failed");
> +	status = imessaging_register(state.client_ctx[3], &state,
> +				     MSG_TMP_BASE-1,
> +				     multi_ctx_client_2_3_handler);
> +	torture_assert(tctx, NT_STATUS_IS_OK(status), "imessaging_register failed");
> +
> +	/*
> +	 * Send one message that need to arrive on 3 ( 5 - 2 ) handlers.
> +	 */
> +	state.num_missing = 5;
> +
> +	status = imessaging_send(state.server_ctx,
> +				 cluster_id(0, 0),
> +				 MSG_TMP_BASE-1, NULL);
> +	torture_assert_ntstatus_ok(tctx, status, "msg failed");
> +
> +	tv = timeval_current();
> +	while (timeval_elapsed(&tv) < 30 && state.num_missing > 0 && state.ok) {
> +		int ret;
> +
> +		ret = tevent_loop_once(tctx->ev);
> +		torture_assert_int_equal(tctx, ret, 0, "tevent_loop_once()");
> +	}
> +
> +	if (!state.ok) {
> +		return false;
> +	}
> +
> +	torture_assert_int_equal(tctx, state.num_missing, 0,
> +				 "wrong message count");
> +
> +	torture_assert(tctx, state.got_client_0_1, "got_client_0_1");
> +	torture_assert(tctx, state.got_client_2_3, "got_client_2_3");
> +	torture_assert(tctx, state.got_server, "got_server");
> +
> +	return true;
> +}
> +
>  struct torture_suite *torture_local_messaging(TALLOC_CTX *mem_ctx)
>  {
>  	struct torture_suite *s = torture_suite_create(mem_ctx, "messaging");
> @@ -400,5 +620,6 @@ struct torture_suite *torture_local_messaging(TALLOC_CTX *mem_ctx)
>  	torture_suite_add_simple_test(s, "overflow_check",
>  				      test_messaging_overflow_check);
>  	torture_suite_add_simple_test(s, "ping_speed", test_ping_speed);
> +	torture_suite_add_simple_test(s, "multi_ctx", test_multi_ctx);
>  	return s;
>  }
> -- 
> 2.17.1
> 
> 
> From 50e2ae92c0b99f57a39a89a63d2dedccf8493ccb Mon Sep 17 00:00:00 2001
> From: Stefan Metzmacher <metze at samba.org>
> Date: Mon, 9 Jul 2018 12:33:34 +0200
> Subject: [PATCH 4/4] s3:messages: make the loop in msg_dgm_ref_recv() more
>  robust against stale pointers
> 
> The interaction between msg_dgm_ref_recv() and msg_dgm_ref_destructor()
> doesn't allow two references from messaging_dgm_ref() to be free'd
> during the loop in msg_dgm_ref_recv().
> 
> In addition to the global 'refs' list, we also need to
> have a global 'next_ref' pointer, which can be adjusted in
> msg_dgm_ref_destructor().
> 
> As AD DC we hit this when using irpc in auth_winbind,
> which uses imessaging_client_init().
> In addition to the main messaging_dgm_ref() in smbd,
> source3/auth/auth_samba4.c: prepare_gensec() and
> make_auth4_context_s4() also generate a temporary
> imessaging_context for auth_context->msg_ctx from within
> auth_generic_prepare().
> 
> Bug: https://bugzilla.samba.org/show_bug.cgi?id=13514
> 
> Signed-off-by: Stefan Metzmacher <metze at samba.org>
> ---
>  selftest/knownfail.d/imessaging |  1 -
>  source3/lib/messages_dgm_ref.c  | 12 +++++++++---
>  2 files changed, 9 insertions(+), 4 deletions(-)
>  delete mode 100644 selftest/knownfail.d/imessaging
> 
> diff --git a/selftest/knownfail.d/imessaging b/selftest/knownfail.d/imessaging
> deleted file mode 100644
> index af8fa8d615a6..000000000000
> --- a/selftest/knownfail.d/imessaging
> +++ /dev/null
> @@ -1 +0,0 @@
> -^samba4.local.messaging.multi_ctx
> diff --git a/source3/lib/messages_dgm_ref.c b/source3/lib/messages_dgm_ref.c
> index 39d22700740c..470dfbeabc74 100644
> --- a/source3/lib/messages_dgm_ref.c
> +++ b/source3/lib/messages_dgm_ref.c
> @@ -35,6 +35,7 @@ struct msg_dgm_ref {
>  
>  static pid_t dgm_pid = 0;
>  static struct msg_dgm_ref *refs = NULL;
> +static struct msg_dgm_ref *next_ref = NULL;
>  
>  static int msg_dgm_ref_destructor(struct msg_dgm_ref *r);
>  static void msg_dgm_ref_recv(struct tevent_context *ev,
> @@ -121,16 +122,16 @@ static void msg_dgm_ref_recv(struct tevent_context *ev,
>  			     const uint8_t *msg, size_t msg_len,
>  			     int *fds, size_t num_fds, void *private_data)
>  {
> -	struct msg_dgm_ref *r, *next;
> +	struct msg_dgm_ref *r;
>  
>  	/*
>  	 * We have to broadcast incoming messages to all refs. The first ref
>  	 * that grabs the fd's will get them.
>  	 */
> -	for (r = refs; r != NULL; r = next) {
> +	for (r = refs; r != NULL; r = next_ref) {
>  		bool active;
>  
> -		next = r->next;
> +		next_ref = r->next;
>  
>  		active = messaging_dgm_fde_active(r->fde);
>  		if (!active) {
> @@ -150,6 +151,11 @@ static int msg_dgm_ref_destructor(struct msg_dgm_ref *r)
>  	if (refs == NULL) {
>  		abort();
>  	}
> +
> +	if (r == next_ref) {
> +		next_ref = r->next;
> +	}
> +
>  	DLIST_REMOVE(refs, r);
>  
>  	TALLOC_FREE(r->fde);
> -- 
> 2.17.1
> 







More information about the samba-technical mailing list