Segfault in smbd: messaging_dgm_fde_active
Jeremy Allison
jra at samba.org
Tue Jul 10 18:26:59 UTC 2018
On Tue, Jul 10, 2018 at 05:01:36PM +0200, Stefan Metzmacher wrote:
> Hi,
>
> here's an updated patchset including a test that also demonstrates the
> problem.
>
> I've also included two fixes to avoid two other flakey tests.
>
> See also https://gitlab.com/samba-team/devel/samba/pipelines/25487638
>
> Please review and push:-)
LGTM. RB+ and pushed !
Jeremy.
> Am 10.07.2018 um 14:39 schrieb Stefan Metzmacher via samba-technical:
> > Hi Jeremy,
> >
> >>> OK, I'm reviewing this now, and I think I understand
> >>> it, but once I'm done I might ask you to add some
> >>> clarification to the commit message (or not, depending
> >>> on how easy this is to understand :-).
> >>
> >> OK, I do think I understand it, and I think the patch hunk
> >> in the destructor isn't quite right.
> >>
> >> Shoudn't it look like:
> >>
> >> @@ -150,6 +150,11 @@ static int msg_dgm_ref_destructor(struct msg_dgm_ref *r)
> >> if (refs == NULL) {
> >> abort();
> >> }
> >> +
> >> + if (r == next_ref) {
> >> + next_ref = r->next;
> >> + }
> >> +
> >> DLIST_REMOVE(refs, r);
> >>
> >> TALLOC_FREE(r->fde);
> >>
> >> instead of:
> >>
> >> @ -152,6 +153,10 @@ static int msg_dgm_ref_destructor(struct msg_dgm_ref *r)
> >> }
> >> DLIST_REMOVE(refs, r);
> >>
> >> + if (r == next_ref) {
> >> + next_ref = NULL;
> >> + }
> >> +
> >> TALLOC_FREE(r->fde);
> >>
> >> DBG_DEBUG("refs=%p\n", refs);
> >>
> >> i.e. Rather than setting arbitrarily to NULL, make next_ref
> >> point to the 'r->next' pointer of the element currently being
> >> removed *before* we remove it. That way we don't prematurely
> >> terminate the list walk in msg_dgm_ref_recv().
> >>
> >> If I'm wrong, can you explain why (because then I
> >> didn't understand it :-) ?
> >
> > Your are right! Thanks for finding that!
> >
> > metze
> >
> >
>
> From 45db46607b63248310eb2ad3dd4a0ea783042202 Mon Sep 17 00:00:00 2001
> From: Ralph Boehme <slow at samba.org>
> Date: Tue, 19 Jun 2018 10:35:04 +0200
> Subject: [PATCH 1/4] ctdb: close the correct pipe fd in a test
>
> This was discovered in an autobuild with a patched tevent that used the
> "poll" backend by default. Test failure:
>
> $ bin/sock_daemon_test /dev/shm/sock_daemon_test.pid /dev/shm/sock_daemon_test.sock 5
> test5[28011]: daemon started, pid=28011
> test5[28011]: listening on /dev/shm/sock_daemon_test.sock
> sock_daemon_test: ../ctdb/tests/src/sock_daemon_test.c:980: test5: Assertion `ret == i+1' failed.
> Abgebrochen (Speicherabzug geschrieben)
> metze at SERNOX14:~/devel/samba/4.0/master4-test$ test5[28011]: PID 28010 gone away, exiting
> test5[28011]: Shutting down
> sock_daemon_test: ../ctdb/tests/src/sock_daemon_test.c:964: test5:
> Assertion `ret == EINTR' failed.
>
> After an epic debugging session we spotted the problem.
>
> Signed-off-by: Ralph Boehme <slow at samba.org>
> Reviewed-by: Stefan Metzmacher <metze at samba.org>
> ---
> ctdb/tests/src/sock_daemon_test.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ctdb/tests/src/sock_daemon_test.c b/ctdb/tests/src/sock_daemon_test.c
> index 916ba2236f03..80004b41c917 100644
> --- a/ctdb/tests/src/sock_daemon_test.c
> +++ b/ctdb/tests/src/sock_daemon_test.c
> @@ -759,7 +759,7 @@ static int test5_client(const char *sockpath, int id, pid_t pid_server,
> tevent_loop_once(ev);
> }
>
> - close(fd[0]);
> + close(fd[1]);
> state.fd = -1;
>
> while (kill(pid_server, 0) == 0 || errno != ESRCH) {
> --
> 2.17.1
>
>
> From c3da01eb6b212e5c891c2d11a41333df5ab0b997 Mon Sep 17 00:00:00 2001
> From: Stefan Metzmacher <metze at samba.org>
> Date: Thu, 21 Jun 2018 06:31:03 +0200
> Subject: [PATCH 2/4] python/tests: make the test_assoc_group_fail2() test more
> resilient against timing
>
> On a busy system [e]poll() on the server will mark both the
> old connection fd and also the listening fd as readable.
>
> epoll() returns the events in order, so the server processes the
> disconnect first.
>
> With poll() we don't have an order of the events and the
> server is likely to process the connect before the disconnect.
>
> Signed-off-by: Stefan Metzmacher <metze at samba.org>
> Reviewed-by: Ralph Boehme <slow at samba.org>
> ---
> python/samba/tests/dcerpc/raw_protocol.py | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/python/samba/tests/dcerpc/raw_protocol.py b/python/samba/tests/dcerpc/raw_protocol.py
> index ff815e97ca63..7cc3a4a29b52 100755
> --- a/python/samba/tests/dcerpc/raw_protocol.py
> +++ b/python/samba/tests/dcerpc/raw_protocol.py
> @@ -18,6 +18,7 @@
>
> import sys
> import os
> +import time
>
> sys.path.insert(0, "bin/python")
> os.environ["PYTHONUNBUFFERED"] = "1"
> @@ -4930,6 +4931,8 @@ class TestDCERPC_BIND(RawDCERPCTest):
> ack = self.do_generic_bind(ctx=ctx)
>
> self._disconnect("test_assoc_group_fail2")
> + self.assertNotConnected()
> + time.sleep(0.5)
> self.connect()
>
> ack2 = self.do_generic_bind(ctx=ctx,assoc_group_id=ack.u.assoc_group_id,
> --
> 2.17.1
>
>
> From 6310e77342d7bab6856aef1c343a060973a73c7b Mon Sep 17 00:00:00 2001
> From: Stefan Metzmacher <metze at samba.org>
> Date: Tue, 10 Jul 2018 16:21:55 +0200
> Subject: [PATCH 3/4] s4:messaging: add local.messaging.multi_ctx.multi_ctx
> test
>
> This tests the usage of multiple imessaging_contexts in one process
> and also freeing two of them during a message handler.
>
> Bug: https://bugzilla.samba.org/show_bug.cgi?id=13514
>
> Signed-off-by: Stefan Metzmacher <metze at samba.org>
> ---
> selftest/knownfail.d/imessaging | 1 +
> source4/lib/messaging/tests/messaging.c | 221 ++++++++++++++++++++++++
> 2 files changed, 222 insertions(+)
> create mode 100644 selftest/knownfail.d/imessaging
>
> diff --git a/selftest/knownfail.d/imessaging b/selftest/knownfail.d/imessaging
> new file mode 100644
> index 000000000000..af8fa8d615a6
> --- /dev/null
> +++ b/selftest/knownfail.d/imessaging
> @@ -0,0 +1 @@
> +^samba4.local.messaging.multi_ctx
> diff --git a/source4/lib/messaging/tests/messaging.c b/source4/lib/messaging/tests/messaging.c
> index ba5897882880..80c8583c21eb 100644
> --- a/source4/lib/messaging/tests/messaging.c
> +++ b/source4/lib/messaging/tests/messaging.c
> @@ -393,6 +393,226 @@ static bool test_messaging_overflow_check(struct torture_context *tctx)
> return true;
> }
>
> +struct test_multi_ctx {
> + struct torture_context *tctx;
> + struct imessaging_context *server_ctx;
> + struct imessaging_context *client_ctx[4];
> + size_t num_missing;
> + bool got_server;
> + bool got_client_0_1;
> + bool got_client_2_3;
> + bool ok;
> +};
> +
> +static void multi_ctx_server_handler(struct imessaging_context *msg,
> + void *private_data,
> + uint32_t msg_type,
> + struct server_id server_id,
> + DATA_BLOB *data)
> +{
> + struct test_multi_ctx *state = private_data;
> + char *str = NULL;
> +
> + torture_assert_goto(state->tctx, state->num_missing >= 1,
> + state->ok, fail,
> + "num_missing should be at least 1.");
> + state->num_missing -= 1;
> +
> + torture_assert_goto(state->tctx, !state->got_server,
> + state->ok, fail,
> + "already got server.");
> + state->got_server = true;
> +
> + /*
> + * We free the context itself and most likely reuse
> + * the memory immediately.
> + */
> + TALLOC_FREE(state->server_ctx);
> + str = generate_random_str(state->tctx, 128);
> + torture_assert_goto(state->tctx, str != NULL,
> + state->ok, fail,
> + "generate_random_str()");
> +
> +fail:
> + return;
> +}
> +
> +static void multi_ctx_client_0_1_handler(struct imessaging_context *msg,
> + void *private_data,
> + uint32_t msg_type,
> + struct server_id server_id,
> + DATA_BLOB *data)
> +{
> + struct test_multi_ctx *state = private_data;
> + char *str = NULL;
> +
> + torture_assert_goto(state->tctx, state->num_missing >= 2,
> + state->ok, fail,
> + "num_missing should be at least 2.");
> + state->num_missing -= 2;
> +
> + torture_assert_goto(state->tctx, !state->got_client_0_1,
> + state->ok, fail,
> + "already got client_0_1.");
> + state->got_client_0_1 = true;
> +
> + /*
> + * We free two contexts and most likely reuse
> + * the memory immediately.
> + */
> + TALLOC_FREE(state->client_ctx[0]);
> + str = generate_random_str(state->tctx, 128);
> + torture_assert_goto(state->tctx, str != NULL,
> + state->ok, fail,
> + "generate_random_str()");
> + TALLOC_FREE(state->client_ctx[1]);
> + str = generate_random_str(state->tctx, 128);
> + torture_assert_goto(state->tctx, str != NULL,
> + state->ok, fail,
> + "generate_random_str()");
> +
> +fail:
> + return;
> +}
> +
> +static void multi_ctx_client_2_3_handler(struct imessaging_context *msg,
> + void *private_data,
> + uint32_t msg_type,
> + struct server_id server_id,
> + DATA_BLOB *data)
> +{
> + struct test_multi_ctx *state = private_data;
> + char *str = NULL;
> +
> + torture_assert_goto(state->tctx, state->num_missing >= 2,
> + state->ok, fail,
> + "num_missing should be at least 2.");
> + state->num_missing -= 2;
> +
> + torture_assert_goto(state->tctx, !state->got_client_2_3,
> + state->ok, fail,
> + "already got client_2_3.");
> + state->got_client_2_3 = true;
> +
> + /*
> + * We free two contexts and most likely reuse
> + * the memory immediately.
> + */
> + TALLOC_FREE(state->client_ctx[2]);
> + str = generate_random_str(state->tctx, 128);
> + torture_assert_goto(state->tctx, str != NULL,
> + state->ok, fail,
> + "generate_random_str()");
> + TALLOC_FREE(state->client_ctx[3]);
> + str = generate_random_str(state->tctx, 128);
> + torture_assert_goto(state->tctx, str != NULL,
> + state->ok, fail,
> + "generate_random_str()");
> +
> +fail:
> + return;
> +}
> +
> +static bool test_multi_ctx(struct torture_context *tctx)
> +{
> + struct test_multi_ctx state = {
> + .tctx = tctx,
> + .ok = true,
> + };
> + struct timeval tv;
> + NTSTATUS status;
> +
> + lpcfg_set_cmdline(tctx->lp_ctx, "pid directory", "piddir.tmp");
> +
> + /*
> + * We use cluster_id(0, 0) as that gets for
> + * all task ids.
> + */
> + state.server_ctx = imessaging_init(tctx,
> + tctx->lp_ctx,
> + cluster_id(0, 0),
> + tctx->ev);
> + torture_assert(tctx, state.server_ctx != NULL,
> + "Failed to init messaging context");
> +
> + status = imessaging_register(state.server_ctx, &state,
> + MSG_TMP_BASE-1,
> + multi_ctx_server_handler);
> + torture_assert(tctx, NT_STATUS_IS_OK(status), "imessaging_register failed");
> +
> + state.client_ctx[0] = imessaging_init(tctx,
> + tctx->lp_ctx,
> + cluster_id(0, 0),
> + tctx->ev);
> + torture_assert(tctx, state.client_ctx[0] != NULL,
> + "msg_client_ctx imessaging_init() failed");
> + status = imessaging_register(state.client_ctx[0], &state,
> + MSG_TMP_BASE-1,
> + multi_ctx_client_0_1_handler);
> + torture_assert(tctx, NT_STATUS_IS_OK(status), "imessaging_register failed");
> + state.client_ctx[1] = imessaging_init(tctx,
> + tctx->lp_ctx,
> + cluster_id(0, 0),
> + tctx->ev);
> + torture_assert(tctx, state.client_ctx[1] != NULL,
> + "msg_client_ctx imessaging_init() failed");
> + status = imessaging_register(state.client_ctx[1], &state,
> + MSG_TMP_BASE-1,
> + multi_ctx_client_0_1_handler);
> + torture_assert(tctx, NT_STATUS_IS_OK(status), "imessaging_register failed");
> + state.client_ctx[2] = imessaging_init(tctx,
> + tctx->lp_ctx,
> + cluster_id(0, 0),
> + tctx->ev);
> + torture_assert(tctx, state.client_ctx[2] != NULL,
> + "msg_client_ctx imessaging_init() failed");
> + status = imessaging_register(state.client_ctx[2], &state,
> + MSG_TMP_BASE-1,
> + multi_ctx_client_2_3_handler);
> + torture_assert(tctx, NT_STATUS_IS_OK(status), "imessaging_register failed");
> + state.client_ctx[3] = imessaging_init(tctx,
> + tctx->lp_ctx,
> + cluster_id(0, 0),
> + tctx->ev);
> + torture_assert(tctx, state.client_ctx[3] != NULL,
> + "msg_client_ctx imessaging_init() failed");
> + status = imessaging_register(state.client_ctx[3], &state,
> + MSG_TMP_BASE-1,
> + multi_ctx_client_2_3_handler);
> + torture_assert(tctx, NT_STATUS_IS_OK(status), "imessaging_register failed");
> +
> + /*
> + * Send one message that need to arrive on 3 ( 5 - 2 ) handlers.
> + */
> + state.num_missing = 5;
> +
> + status = imessaging_send(state.server_ctx,
> + cluster_id(0, 0),
> + MSG_TMP_BASE-1, NULL);
> + torture_assert_ntstatus_ok(tctx, status, "msg failed");
> +
> + tv = timeval_current();
> + while (timeval_elapsed(&tv) < 30 && state.num_missing > 0 && state.ok) {
> + int ret;
> +
> + ret = tevent_loop_once(tctx->ev);
> + torture_assert_int_equal(tctx, ret, 0, "tevent_loop_once()");
> + }
> +
> + if (!state.ok) {
> + return false;
> + }
> +
> + torture_assert_int_equal(tctx, state.num_missing, 0,
> + "wrong message count");
> +
> + torture_assert(tctx, state.got_client_0_1, "got_client_0_1");
> + torture_assert(tctx, state.got_client_2_3, "got_client_2_3");
> + torture_assert(tctx, state.got_server, "got_server");
> +
> + return true;
> +}
> +
> struct torture_suite *torture_local_messaging(TALLOC_CTX *mem_ctx)
> {
> struct torture_suite *s = torture_suite_create(mem_ctx, "messaging");
> @@ -400,5 +620,6 @@ struct torture_suite *torture_local_messaging(TALLOC_CTX *mem_ctx)
> torture_suite_add_simple_test(s, "overflow_check",
> test_messaging_overflow_check);
> torture_suite_add_simple_test(s, "ping_speed", test_ping_speed);
> + torture_suite_add_simple_test(s, "multi_ctx", test_multi_ctx);
> return s;
> }
> --
> 2.17.1
>
>
> From 50e2ae92c0b99f57a39a89a63d2dedccf8493ccb Mon Sep 17 00:00:00 2001
> From: Stefan Metzmacher <metze at samba.org>
> Date: Mon, 9 Jul 2018 12:33:34 +0200
> Subject: [PATCH 4/4] s3:messages: make the loop in msg_dgm_ref_recv() more
> robust against stale pointers
>
> The interaction between msg_dgm_ref_recv() and msg_dgm_ref_destructor()
> doesn't allow two references from messaging_dgm_ref() to be free'd
> during the loop in msg_dgm_ref_recv().
>
> In addition to the global 'refs' list, we also need to
> have a global 'next_ref' pointer, which can be adjusted in
> msg_dgm_ref_destructor().
>
> As AD DC we hit this when using irpc in auth_winbind,
> which uses imessaging_client_init().
> In addition to the main messaging_dgm_ref() in smbd,
> source3/auth/auth_samba4.c: prepare_gensec() and
> make_auth4_context_s4() also generate a temporary
> imessaging_context for auth_context->msg_ctx from within
> auth_generic_prepare().
>
> Bug: https://bugzilla.samba.org/show_bug.cgi?id=13514
>
> Signed-off-by: Stefan Metzmacher <metze at samba.org>
> ---
> selftest/knownfail.d/imessaging | 1 -
> source3/lib/messages_dgm_ref.c | 12 +++++++++---
> 2 files changed, 9 insertions(+), 4 deletions(-)
> delete mode 100644 selftest/knownfail.d/imessaging
>
> diff --git a/selftest/knownfail.d/imessaging b/selftest/knownfail.d/imessaging
> deleted file mode 100644
> index af8fa8d615a6..000000000000
> --- a/selftest/knownfail.d/imessaging
> +++ /dev/null
> @@ -1 +0,0 @@
> -^samba4.local.messaging.multi_ctx
> diff --git a/source3/lib/messages_dgm_ref.c b/source3/lib/messages_dgm_ref.c
> index 39d22700740c..470dfbeabc74 100644
> --- a/source3/lib/messages_dgm_ref.c
> +++ b/source3/lib/messages_dgm_ref.c
> @@ -35,6 +35,7 @@ struct msg_dgm_ref {
>
> static pid_t dgm_pid = 0;
> static struct msg_dgm_ref *refs = NULL;
> +static struct msg_dgm_ref *next_ref = NULL;
>
> static int msg_dgm_ref_destructor(struct msg_dgm_ref *r);
> static void msg_dgm_ref_recv(struct tevent_context *ev,
> @@ -121,16 +122,16 @@ static void msg_dgm_ref_recv(struct tevent_context *ev,
> const uint8_t *msg, size_t msg_len,
> int *fds, size_t num_fds, void *private_data)
> {
> - struct msg_dgm_ref *r, *next;
> + struct msg_dgm_ref *r;
>
> /*
> * We have to broadcast incoming messages to all refs. The first ref
> * that grabs the fd's will get them.
> */
> - for (r = refs; r != NULL; r = next) {
> + for (r = refs; r != NULL; r = next_ref) {
> bool active;
>
> - next = r->next;
> + next_ref = r->next;
>
> active = messaging_dgm_fde_active(r->fde);
> if (!active) {
> @@ -150,6 +151,11 @@ static int msg_dgm_ref_destructor(struct msg_dgm_ref *r)
> if (refs == NULL) {
> abort();
> }
> +
> + if (r == next_ref) {
> + next_ref = r->next;
> + }
> +
> DLIST_REMOVE(refs, r);
>
> TALLOC_FREE(r->fde);
> --
> 2.17.1
>
More information about the samba-technical
mailing list