tevent_abort_nesting crash in idmap_ad

Jeremy Allison jra at samba.org
Fri Jun 24 17:46:17 UTC 2016

On Fri, Jun 24, 2016 at 10:33:28AM -0700, Jeremy Allison wrote:
> On Fri, Jun 24, 2016 at 05:48:24PM +0200, Volker Lendecke wrote:
> > On Fri, Jun 24, 2016 at 05:41:41PM +0200, Ralph Boehme wrote:
> > > 
> > > I was able to briefly talk to metze and he suggested we wait til
> > > Monday to evaluate our options.
> > 
> > There are no options. gensec has had nested eventloops forever, and
> > it will take many months of metze's full time work to fix this. We
> > can't wait for that, this will never happen. I have tried to convince
> > the gensec masters for years that this is required, but we just have to
> > accept the fact that gensec requires nested event loops by its very core
> > design, and I am not willing to accept nested event loops in code that
> > I feel responsible for. There was one person in the world who was able
> > to debug nested event loop code, and this was Tridge. Tridge left Samba,
> > so we have nobody anymore to debug that code when bad things happen.
> Started looking into this mess....
> Looks like only the SAMBA4_USES_HEIMDAL code paths have
> this problem.
> This only happens on the tldap bind when we're trying
> to get krb5 creds from the KDC.
> The idmap_ad_get_tldap_ctx() code does a blocking
> tldap_gensec_bind() call, which creates a local event
> context, so this isn't being done on the main winbindd
> event context - it's contained inside the event context
> inside tldap_gensec_bind().
> If there were a way to force gensec to ensure the
> correct ticket had already been fetched inside
> idmap_ad_get_tldap_ctx() - *before* it called
> tldap_gensec_bind() - maybe by creating a local
> event context there and forcing the loop fetching
> the creds from the KDC and getting them stashed
> inside the krb5 ccache, then there'd be no
> possibility of getting the nesting to trigger
> inside the tevent_req_poll() inside tldap_gensec_bind().

Ah ! Talking here at the event to Alexander.

Inside tldap_gensec_bind_got_mechs() we could
create a new sub-event loop:

new_ev_ctx = samba_tevent_context_init(frame);

Then run the gensec_update_send() calls
under this new event context - rather than
running it under the state->ev context.

So we'd have 2 event contexts running - one
which we use to call gensec_update_send(),
and one where we progress through the
tldap_sasl_bind_send() -> tldap_gensec_bind_done()
-> tldap_sasl_bind_recv() calls.

We have to alternate the calls to tevent_req_poll()
to ensure we never run both event contexts at
the same time, but wait for each completion event

As this is a completely blocking operation
anyway, this isn't going to make the code
performace worse, it's going to make the
internals more complex, but even with this
added complexity it's simpler than having
a nested event loop :-) :-).

More information about the samba-technical mailing list