[PATCH] [ldb_modules ABI break] Re: [WIP][PATCH] Safe LDB locking for better replication

Andrew Bartlett abartlet at samba.org
Fri Jun 23 20:32:22 UTC 2017


On Fri, 2017-06-23 at 13:25 +0200, Stefan Metzmacher wrote:
> Hi Andrew,
> 
> > > > and also is not
> > > > backwards compatible with Samba < 4.7.
> > > 
> > > Why and what to you mean exactly?
> > > Building Samba 4.6 with the new ldb version?
> > 
> > Yes.
> > 
> > For the list archives: If you use an old Samba (or old anything,
> > without a hack forcing an override) and don't rebuild, they will fail
> > to operate as we have a check.  If you override, then you will supply a
> > too short modules structure.  But you know all this.
> > 
> > So the answer is yes, Samba 4.6 will build fine, but then fail as
> > below:
> 
> I've created https://bugzilla.samba.org/show_bug.cgi?id=12859
> to crack that problem.
> 
> We need to backport something that will allow us to let
> future 4.5 and 4.6 releases work with ldb >= 1.1.30.
> 
> Any ideas about what we can backport?
> Or how we can add some code that works around problems?

No, and I'm really, really hesitant about doing it.  I've been working
on this for months now, and while I'm happy to say that 4.7 will be
great, after 3 months of testing, just adding locking under a stack
that has never had locking before is fundamentally risky.

Nothing less than the full set of patches need to be backported, that
is why the full set of patches were developed:  we are actually not
getting a new feature here, all the patches are just to make us still
work with locking enforced!

> > > > It is likely to trigger deadlock detected "failed to upgrade hash
> > > > locks", LDB_ERR_BUSY on earlier Samba versions, as they came to not
> > > > expect locking (and the required lock ordering) to be enforced on
> > > > ldb_tdb.
> > > 
> > > Can you give an example commit that is missing in earlier Samba
> > > versions?
> > 
> > commit 10e7c749e7b7a4155669c6ecf97a9ac52b13110a
> > Author: Andrew Bartlett <abartlet at samba.org>
> > Date:   Wed May 31 12:22:28 2017 +1200
> > 
> >     dsdb: Correctly call ldb_module_done in dsdb_notification
> > 
> > For the 1.1.30 ldb version, the even context changes require this in
> > Samba (less important, only hits -M single):
> > 
> > commit 7259661467776a76c4fa3aabaf1ae8a3d531e506
> > Author: Andrew Bartlett <abartlet at samba.org>
> > Date:   Fri May 12 01:55:45 2017 +0200
> > 
> >     dsdb: Use ldb_handle_use_global_event_context for rootdse modifies
> >     
> >     The modify operations on the rootDSE turn into IRPC messages, and
> > these need
> >     to be handled on the global event context, not the per-operation
> > context
> 
> Would it help to introduce an
> ldb_set_allow_private_event_context() that's called by a new Samba
> release. And other callers get the old behavior?
> 
> It would still leave us with unusable 1.1.30 and 1.1.31,
> but 1.2.0 would be usable again.

No, I don't think that is wise.  That just invites other operations
(possibly writes) to happen during a read lock, and that invites
LDB_ERR_BUSY (unable to upgrade hash locks). 

> > And this probably only fails in make test and causes noise (tried to do
> > transaction writes during a search):
> > 
> > commit 5f0e53f1b90369c649688122c0a8742352f13877
> > Author: Andrew Bartlett <abartlet at samba.org>
> > Date:   Wed May 3 22:53:14 2017 +0200
> > 
> >     dsdb: Do not write the @INDEXLIST or @ATTRIBUTES records during
> > schema refresh
> >     
> >     Instead, write it once in the module init, if required, and after a
> >     modify to the schema partition is detected
> 
> We may need to do the read locking optional as well
> only if the callers requested it.

I realise that is what we have operated on for so long, but I'm not
happy with it.  To only do the read locking over all the DBs optionally
invites deadlocks due to unstructured lock ordering depending on what
direction links are followed in during extended_dn_out processing. 

That is, the reason we take out all the backend locks, as well as
performance, is to ensure we take them all out in the same order,
rather than let the module stack above take them in different orders as
they process things during the search. 

> > But this will fail in normal operation:
> > 
> > commit b8ba0103bf45670c31384c56d6cd63bbef760a0c
> > Author: Andrew Bartlett <abartlet at samba.org>
> > Date:   Wed May 3 22:51:30 2017 +0200
> > 
> >     dsdb: Take out the transaction and prepare_commit locks in the same
> > order
> >     
> >     We must, when starting the transaction and preparing to commit the
> > transaction, take
> >     out the locks in the same order across all the databases, otherwise
> > we may deadlock.
> 
> That's more a problem introduced by
> "tdb: Remove locking from tdb_traverse_read()" correct?

Perhaps.  As I say, every change I've made was made for a reason, that
being without it broke.

> Would it help if we would grab the write lock in
> ldb_transaction_start() instead of prepare_commit()
> unless the caller of ldb (a new Samba version) indicates
> support for this?

I don't know.  We are in a very sticky situation. 

For my money, we just declare the incomparability and go on, but I
don't have to support an enterprise distribution etc.  

Fundamentally, we got our locking wrong, and that is never an easy egg
to unscramble.  I don't think there are any simple solutions here.

Andrew Bartlett

-- 
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba




More information about the samba-technical mailing list