# [PATCH][WIP] Make the Samba AD DC multi-process

Andrew Bartlett abartlet at samba.org
Wed Dec 14 04:47:50 UTC 2016

```On Wed, 2016-12-14 at 16:11 +1300, Andrew Bartlett wrote:
> On Wed, 2016-12-14 at 07:22 +1300, Andrew Bartlett wrote:
> > On Tue, 2016-12-13 at 17:37 +0100, Stefan Metzmacher wrote:
> > > Hi Andrew,
> > >
> > > >
> > > > >
> > > > > I don't see my concerns of my previous mails addressed,
> > > > > so please don't push it as is.
> > > > >
> > > > > I can try to explain them again if required later today.
> > > >
> > > > I'm really sorry to have to trouble you for them, because I've
> > > > been
> > > > working hard to address them.
> > > >
> > > >  - I have added an smb.conf option for the lsa over netlogon
> > > > behaviour
> > > >  - I have added tests for that option
> > > >  - I've explained that we can't conditionally check for
> > > > incorrect
> > > > association groups until we know which endpoint we bound to
> > >
> > > This is wrong we know the endpoint from the time we do the
> > > listen on the socket. Or am I missing something?
> > >
> > > call->conn->endpoint should be available before we even enter
> > > dcesrv_bind() everything else is a bug that needs to be fixed.
> > > So please keep the assoc_group checks in one place.
> >
> > Thanks.  I missed where we could rely on that after my previous
> > attempt
> > to find iface, which wasn't available earlier.
>
> Done.
>
> > > We also don't need any check in dcesrv_handle_fetch()
> > > as there won't be any handles if they're not created.
> >
> > The purpose of that is to assert that the pipe's flags are correct.
> >  I
> > don't want to be debugging situations that happen only in some
> > contexts, just because the pipe miss-declared if it uses handles.
> >
> > We can drop that patch if it still makes you unhappy.
> >
> > > We also need to make sure we explicitly set
> > > ep->use_single_process = true; if the lsa interface
> > > tries to use the netlogon named pipe.
> >
> > +	/*
> > +	 * By default don't force into a single process, but if
> > any
> > +	 * interface on this endpoint on this service uses handles
> > +	 * (most do), then we must force into single process mode
> > +	 */
> > +	if (use_single_process == true) {
> > +		ep->use_single_process = true;
> >  	}
> >
> >
> > I'll expand the comment to explain that this more than covers lsa
> > over
> > netlogon, it also covers the ncalrpc case (all on one socket),
> > which
> > also remains a single process.
>
> Done.
>
> > > And we may need to use DLIST_ADD_END() instead of
> > > DLIST_ADD() in some places and also check
> > > ep->use_single_process if find_endpoint() found an endpoint.
> >
> > Why do we need to change this?  What specifically do you want
> > changed.
> >  I need, sadly, more detail than 'in some places'.
> >
> > If find_endpoint() finds an endpoint, then we use that, unless we
> > are
> > over TCP/IP (previously all bound to a single port) and only if we
> > are
> > not forced to single process mode.
> >
> > > Before calling dcesrv_add_ep() we may reset e->use_single_process
> > > if this_model_ops == model_ops.
> >
> > Correct, we reset e->use_single_process to true if any process on
> > the
> > endpoint needs single process mode.  This happens most of the time.
>
>
> > > Maybe it would be good to have (temporary) DEBUG() statements
> > > which print endpoint details and all available interfaces
> > > in dcesrv_add_ep(). If we would have such code I'd like to
> > > see the output of it in various situations, with all combinations
> > > of possible process models and lsaovernetlogon=yes/no.
> > > This would it much easier to judge if I can be happy with the
> > > logic you've implemented.
> >
> > You can see that the lsa over netlogon works because we have
> > explicit
> > tests added for that.  If adding lsa to \pipe\netlogon it wasn't
> > controlling the process model for netlogon, it wouldn't
> > succeed/fail
> > when changing this option.
> >
> > We can just drop that lsa over netlogon over NP part for now if it
> > really bothers you that much.  The important client (winbindd)
> > prefers
> > ncacn_ip_tcp anyway.
> >
> > > >
> > > >  - I've changed to specifying flags, not just the use of
> > > > handles
> > > >  - I've explained why the dcerpc interface should and does
> > > > declare
> > > > about the use of handles, not the ignoring of association
> > > > groups,
> > > > because per your request, only when all interfaces on an
> > > > endpoint
> > > > don't
> > > > use handles, do we ignore association groups.  (And the DCE/RPC
> > > > server
> > > > could track this some other way if it wanted).
> > > >  - I've explained that unused records are removed in the same
> > > > way
> > > > as
> > > > the old code, on startup.
> > >
> > > That's not true. You removed memcache_delete() without a
> > > replacement
> > > and challenge_cache_fetch() is used if the client actually
> > > switches the connection between netr_ServerReqChallenge()
> > > and netr_ServerAuthenticate(), otherwise we'll use
> > > dce_call->context->private_data to get the challenge.
> >
> > Thanks.  I'll add some tests to cover that, as it has some security
> > properties in the case where we don't enforce schannel.  We have to
> > remove it even if we didn't fetch it, so removing it on the
> > challenge_cace_fetch() isn't enough.  Otherwise someone could re-
> > use
> > an
> > observed challenge (they probably couldn't do very much with the
> > result, but it wouldn't be good).
>
> I've added important tests for this, which has shown up some very
> important bugs.
>
> >
> > > I also guess some of the new includes are not required
> > > anymore, at least "replace.h" comes already via
> > > "includes.h" and #include "system/filesys.h" should
> > >
> > > And I think netlogon_cache_entry should be moved to
> > > schannel.idl next to netlogon_creds_CredentialState.
> >
> > OK.
>
> Done.
>
> > > >
> > > >  - And, as you have seen from Garming, we have shown this makes
> > > > a
> > > > significant improvement to the use case:  high-load 802.1x
> > > > (NTLM)
> > > > authentication via winbind on domain members.
> > > >
> > > > We have a customer who suffers significant load on the DCs
> > > > after
> > > > a
> > > > wifi
> > > > outage, because the DC needs to suddenly re-authenticate all
> > > > the
> > > > users.
> > > >  That is why we are working on this.
> > > >
> > > > I've re-read your last mail, and I'm honestly at my wits end as
> > > > what
> > > > more you want, so in the desperate hope that we are simply in
> > > > miscommunication, can you please make clear again what specific
> > > > changes
> > > > you want?  I would really like to move on from this, and lock
> > > > it
> > > > in
> > > > for
> > > > 4.6.
> > >
> > > I also hope we'll have it in 4.6.
> > >
> > > Thanks for your patience.
> > > metze
> >
> > I'll see what I can do today, starting with using the endpoint for
> > the
> > association group check, and the DB delete.
>
> I hope that what I attach here meets with your approval.  In
> particular, I've added a test to cover bug 11291 (netapp challenge
> reuse).

I've fixed the extra #include lines and fixed another test to run
without "allow nt4 crypto".

See attached.

Thanks,

Andrew Bartlett
-------------- next part --------------
A non-text attachment was scrubbed...