[PATCH] [BUG# 12858] Re: [WIP][PATCH] Safe LDB locking for better replication

Andrew Bartlett abartlet at samba.org
Fri Jun 23 02:44:39 UTC 2017


On Fri, 2017-06-23 at 09:25 +1200, Andrew Bartlett via samba-technical
wrote:
> On Thu, 2017-06-22 at 11:08 +0200, Stefan Metzmacher wrote:
> > Am 22.06.2017 um 10:36 schrieb Andrew Bartlett:
> > > 
> > > Without wishing to derail this, I would note that this does change the
> > > ldb module ABI (which we never promised anyway),
> > 
> > I already changed the commit to be ldb-1.2.0.
> > 
> > > and also is not
> > > backwards compatible with Samba < 4.7.
> > 
> > Why and what to you mean exactly?
> > Building Samba 4.6 with the new ldb version?
> 
> Yes.
> 
> For the list archives: If you use an old Samba (or old anything,
> without a hack forcing an override) and don't rebuild, they will fail
> to operate as we have a check.  If you override, then you will supply a
> too short modules structure.  But you know all this.
> 
> So the answer is yes, Samba 4.6 will build fine, but then fail as
> below:
> 
> > > It is likely to trigger deadlock detected "failed to upgrade hash
> > > locks", LDB_ERR_BUSY on earlier Samba versions, as they came to not
> > > expect locking (and the required lock ordering) to be enforced on
> > > ldb_tdb.
> > 
> > Can you give an example commit that is missing in earlier Samba
> > versions?
> 
> commit 10e7c749e7b7a4155669c6ecf97a9ac52b13110a
> Author: Andrew Bartlett <abartlet at samba.org>
> Date:   Wed May 31 12:22:28 2017 +1200
> 
>     dsdb: Correctly call ldb_module_done in dsdb_notification
> 
> For the 1.1.30 ldb version, the even context changes require this in
> Samba (less important, only hits -M single):
> 
> commit 7259661467776a76c4fa3aabaf1ae8a3d531e506
> Author: Andrew Bartlett <abartlet at samba.org>
> Date:   Fri May 12 01:55:45 2017 +0200
> 
>     dsdb: Use ldb_handle_use_global_event_context for rootdse modifies
>     
>     The modify operations on the rootDSE turn into IRPC messages, and
> these need
>     to be handled on the global event context, not the per-operation
> context
> 
> 
> And this probably only fails in make test and causes noise (tried to do
> transaction writes during a search):
> 
> commit 5f0e53f1b90369c649688122c0a8742352f13877
> Author: Andrew Bartlett <abartlet at samba.org>
> Date:   Wed May 3 22:53:14 2017 +0200
> 
>     dsdb: Do not write the @INDEXLIST or @ATTRIBUTES records during
> schema refresh
>     
>     Instead, write it once in the module init, if required, and after a
>     modify to the schema partition is detected
> 
> 
> But this will fail in normal operation:
> 
> commit b8ba0103bf45670c31384c56d6cd63bbef760a0c
> Author: Andrew Bartlett <abartlet at samba.org>
> Date:   Wed May 3 22:51:30 2017 +0200
> 
>     dsdb: Take out the transaction and prepare_commit locks in the same
> order
>     
>     We must, when starting the transaction and preparing to commit the
> transaction, take
>     out the locks in the same order across all the databases, otherwise
> we may deadlock.

Giving this reasonable question, I do wish I had tagged the bugs, but I
didn't.  However I've filed a clear bug now, so please add 

BUG: https://bugzilla.samba.org/show_bug.cgi?id=12858

to any commits not yet in master.

> > > Naturally, this is balanced by the fact that we absolutely have to
> > > merge this, as the read and replication corruption it caused has been
> > > observed in the real world, not just our selftest.
> > 
> > Yes!
> > 
> > I've looked through the patches and found two things:
> > 
> > 1)
> >  "ldb: Add test encoding current locking behaviour during ldb_search()"
> >  should include the lib/ldb/tests/ldb_mod_op_test.c changes
> >  from "ldb: Lock the whole backend database for the duration of a search"
> >  Or I'm I missing something?
> 
> No, what I tried to do is to show what was the behaviour, then fix the
> test for the new behaviour when fixing the code.

To be clear, the idea is that the tests do pass in each commit.

> I would have done the knownfail trick but we don't have that for ldb.
> 
> > 2)
> >  "dsdb: Teach the Samba partition module how to lock all the DB backends"
> >  got the unlocking order wrong, which must match
> >  {end,del}_trans() instead of read_lock().
> 
> Well spotted, I agree we want to unlock sam.ldb last.
> 
> >  The key is that the read locks on the individual partitions
> >  are purely a performance optimization to avoid
> >  fcntl syscalls, as these looks are never contented,
> >  because if the "BIG-Lock" on the metadata (sam.ldb).
> > 
> > I'm starting private autobuilds with the attached patchset.
> 
> Thanks so much for all your careful examination of this!
> 
> Do you want to tidy up your comments, squash in the unlock and push
> once you are happy with the autobuild, or is there something more you
> need from me?
> 
> Thanks,
> 
> Andrew Bartlett
> 
-- 
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba




More information about the samba-technical mailing list