A Modest Proposal for Preventing the Event loops of Samba DCs From Being a Burthen to Their Implementers or Users, and for Making Them Beneficial to the Publick

Thu Apr 17 08:06:24 MDT 2014

On Thu, 2014-04-17 at 15:53 +0200, Stefan (metze) Metzmacher wrote:
> Am 17.04.2014 15:25, schrieb Simo:
> > On Thu, 2014-04-17 at 14:49 +0200, Stefan (metze) Metzmacher wrote:
> >> Hi Simo,
> >>
> >>>> It does make ldb less 'async' as far as the caller is concerned, but we
> >>>> simply don't use ldb in an async way in Samba.  (It is very unfortunate
> >>>> we carry the great complexity and risk of an async ldb without any
> >>>> significant use). 
> >>>>
> >>>> I have this under a private autobuild, and I would appreciate your
> >>>> thoughts. 
> >>>
> >>> I do not really see the point of a separate event context honestly.
> >>> All you need is clearly some locking, so that a new toplevel ldb
> >>> operation can only be started from within the transaction, while any
> >>> other is received but not scheduled until a previous transaction is
> >>> finished.
> >>> Blocking for long periods on heavy I/O in an async server will provide
> >>> terrible outcomes.
> >>
> >> We're only doing local tdb operations during a transaction, so I think
> >> it's really good to use a separate event context and do everything isolated
> >> without any side effects.
> >>
> >> All locking hacks will result in deadlocks.
> > 
> > Note that a separate context is the same thing, it is just locking with
> > a bigger hammer, and called differently. You can still have deadlocks if
> > in the transaction you create an operation that decides to wait on the
> > global event loop (which is now stopped), or create a new event loop and
> > blocking there.
> 
> The difference is that we know that we don't use the global event loop.
> 
> > I don't see much difference from the point of view of possible
> > deadlocks, but I see issues with the main event loop being blocked for
> > long period of times making the whole server completely non-responsive.
> 
> It isn't blocked any longer than needed, the single local transaction
> should be very fast, otherwise we have other problems.
> 
> There's a big difference.
> 
> This a nested a event context (with its isolated loop), we won't even start
> unrelated operations, but finish the transaction as fast as possible.
> Any theoretical deadlock in this situation is based on a bug in the code,
> where we somehow use the wrong event context.
> 
> With a nested event loop (on the global context), we might start
> processing an unrelated rpc request, which calls a sync ldb function,
> how do you want to avoid a deadlock in that situation?
> This can be triggered by special request order from a client.

The idea I had was that if you see there is already a transaction going
but this operation is not a child of that transaction, you simply defer
starting it. Ideally this would be done in LDB by adding a transaction
handle that you must be passed down from your parent, or you don't have.

If you do not have one you need to create one, and while a transaction
handle is active all operations bearing another one are skipped in the
event loop and go back to sleep.

> > I would rather strip away the ldb async layer if the aim is to avoid
> > looping in a context, so then you cannot at all create new calls and
> > wait on them because you have no event loop to pass at all.
> > It will also greatly simplify some code.
> 
> Removing the ldb async layer requires some work, but might be a good idea,
> then we can use a sane async ldap library is we need async remote ldap
> calls.

Well, we cannot simply remove it, it is public API, but we can deprecate
it inside samba with a couple of simple defines that initially will
cause compilation warnings and eventually will cause compilation errors,
like we do for system calls like strcat().

For remote LDAP calls we should just stop using LDB and switch to the
tldap code Volker contributed. The LDB API always felt quite a bit
awkward on the client side for me.

> > The main problem I see here is the LDAP server, if we can agree
> > officially to move to OpenLDAP + overlays (ie fully threaded LDAP
> > server) and throw away our own home grown thing, then we'll be in a much
> > better position, and a fully sync LDB will be just fine.
> 
> This has nothing to do with OpenLDAP, even if it would it would mean
> that we need a full async interface to avoid blocking waiting for
> external processes.

Well, unless we want to convert all our code to make it thread safe then
really we ought to start thinking about LDAP performance. I do not think
forking multiple tasks is a workable solution and I do not think a fully
synchronous server is really going to cut it in production. So I was
proposing to just go and work on a better long term plan, use a proven
good and fast LDAP server engine and stop trying to do our own.

It is a bit far fetched, but I think it is inevitable in the long run.

Simo.