abort() calls from messaging code

Jeremy Allison jra at samba.org
Thu Jun 15 18:31:32 UTC 2017


On Thu, Jun 15, 2017 at 11:53:45AM +0200, Volker Lendecke via samba-technical wrote:
> On Tue, Jun 13, 2017 at 05:03:36PM +1200, Andrew Bartlett via samba-technical wrote:
> > (resend from samba.org address subscribed to the list)
> > 
> > I'm seeing odd failures in the messaging code while attempting builds
> > on my new locking code for ldb.  It seems as I reduce the other
> > flapping tests, we perturb the state and start to see different parts
> > of samba misbehave. 
> 
> Fix one race, open another.
> 
> The attached patch should fix it. Review appreciated!

OK, just wanna write this down (and may append the commit message)
just to prove I understand the fix (please confirm :-).

Inside _tevent_threaded_schedule_immediate() we have:

476         ret = pthread_mutex_unlock(&ev->scheduled_mutex);
477         if (ret != 0) {
478                 abort();
479         }

HERE!!!!

481         ret = pthread_mutex_unlock(&tctx->event_ctx_mutex);
482         if (ret != 0) {
483                 abort();
484         }

At the HERE!!! point, what happens is tevent_common_threaded_activate_immediate(),
which is blocked on ev->scheduled_mutex, get released and does:

514         while (ev->scheduled_immediates != NULL) {
515                 struct tevent_immediate *im = ev->scheduled_immediates;
516                 DLIST_REMOVE(ev->scheduled_immediates, im);
517                 DLIST_ADD_END(ev->immediate_events, im);
518         }

- making an immediate event ready to be scheduled.

This then returns into epoll_event_loop_once(), which then calls:

910         if (ev->immediate_events &&
911             tevent_common_loop_immediate(ev)) {
912                 return 0;
913         }

which causes the immediate event to fire. This immediate
event is the pthread job terminate event, which was previously
set up in pthreadpool_tevent_job_signal() by:

198         if (state->tctx != NULL) {
199                 /* with HAVE_PTHREAD */
200                 tevent_threaded_schedule_immediate(state->tctx, state->im,
201                                                    pthreadpool_tevent_job_done,
202                                                    state);

So we now call pthreadpool_tevent_job_done() - which does:

225         TALLOC_FREE(state->tctx);

calling tevent_threaded_context_destructor():

384         ret = pthread_mutex_destroy(&tctx->event_ctx_mutex); <---------------- BOOOM returns an error !
385         if (ret != 0) {
386                 abort();
387         }

as we haven't gotten to line 481 above (the line after
HERE!!!!, above) so the tctx->event_ctx_mutex is still
locked when we try to destroy it.

So doing an additional:

	ret = pthread_mutex_lock(&tctx->event_ctx_mutex);
	ret = pthread_mutex_unlock(&tctx->event_ctx_mutex);

(error checking elided) forces tevent_threaded_context_destructor()
to wait until tctx->event_ctx_mutex is unlocked before it locks/unlocks
and then is guarenteed safe to destroy.

Phew :-). RB+.

I'm guessing that looking at:

https://git.samba.org/jra/samba-autobuild/samba.stdout

specifically:

#6  <signal handler called>
No locals.
#7  0x00002b0618812c37 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        resultvar = 0
        pid = 587036
        selftid = 587036
#8  0x00002b0618816028 in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7ffc46137510, sa_sigaction = 0x7ffc46137510}, sa_mask = {__val = {47305181671241, 0, 47304978220113, 72, 8, 140721484166816, 249694476928, 0, 47305356153232, 140721484166720, 47304978247953, 140721484166464, 8, 140721484166816, 250283783536, 47305029654232}}, sa_flags = 586373760, sa_restorer = 0x0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#9  0x00002b060f7cba97 in tevent_threaded_context_destructor (tctx=0x2b0622f35ae0) at ../lib/tevent/tevent_threads.c:386
        ret = 16
#10 0x00002b061174daea in _tc_free_internal (tc=0x2b0622f35a80, location=0x2b0619813288 "../lib/pthreadpool/pthreadpool_tevent.c:225") at ../lib/talloc/talloc.c:1078
        d = 0x2b060f7cb958 <tevent_threaded_context_destructor>
        ptr_to_free = 0x0
        ptr = 0x2b0622f35ae0
#11 0x00002b061174df95 in _talloc_free_internal (ptr=0x2b0622f35ae0, location=0x2b0619813288 "../lib/pthreadpool/pthreadpool_tevent.c:225") at ../lib/talloc/talloc.c:1174
        tc = 0x2b0622f35a80
#12 0x00002b061174f250 in _talloc_free (ptr=0x2b0622f35ae0, location=0x2b0619813288 "../lib/pthreadpool/pthreadpool_tevent.c:225") at ../lib/talloc/talloc.c:1716
        tc = 0x2b0622f35a80
#13 0x00002b061980eb9d in pthreadpool_tevent_job_done (ctx=0x2b060c174f60, im=0x2b0622f312a0, private_data=0x2b0622f311f0) at ../lib/pthreadpool/pthreadpool_tevent.c:225
        state = 0x2b0622f311f0
#14 0x00002b060f7c76c2 in tevent_common_loop_immediate (ev=0x2b060c174f60) at ../lib/tevent/tevent_immediate.c:135
        im = 0x2b0622f312a0
        handler = 0x2b061980ea29 <pthreadpool_tevent_job_done>
        private_data = 0x2b0622f311f0
#15 0x00002b060f7d096d in epoll_event_loop_once (ev=0x2b060c174f60, location=0x2b060bb9b330 "../source4/lib/messaging/tests/messaging.c:116") at ../lib/tevent/tevent_epoll.c:911
        epoll_ev = 0x2b060c1751f0
        tval = {tv_sec = 59, tv_usec = 47305029655147}
        panic_triggered = false
#16 0x00002b060f7cd70e in std_event_loop_once (ev=0x2b060c174f60, location=0x2b060bb9b330 "../source4/lib/messaging/tests/messaging.c:116") at ../lib/tevent/tevent_standard.c:114
        glue_ptr = 0x2b060c1750a0
        glue = 0x2b060c1750a0
        ret = 0
#17 0x00002b060f7c6560 in _tevent_loop_once (ev=0x2b060c174f60, location=0x2b060bb9b330 "../source4/lib/messaging/tests/messaging.c:116") at ../lib/tevent/tevent.c:735
        ret = 0
        nesting_stack_ptr = 0x0
#18 0x00002b060b6e4578 in test_ping_speed (tctx=0x2b060c175290) at ../source4/lib/messaging/tests/messaging.c:116

Gave you the clue you needed to fix this ?

You know, I really don't feel bad about missing that in
my initial review now :-) :-).

Cheers,

Jeremy.



More information about the samba-technical mailing list