[PATCH] [BUG# 12858] Re: [WIP][PATCH] Safe LDB locking for better replication
Andrew Bartlett
abartlet at samba.org
Fri Jun 23 02:44:39 UTC 2017
On Fri, 2017-06-23 at 09:25 +1200, Andrew Bartlett via samba-technical
wrote:
> On Thu, 2017-06-22 at 11:08 +0200, Stefan Metzmacher wrote:
> > Am 22.06.2017 um 10:36 schrieb Andrew Bartlett:
> > >
> > > Without wishing to derail this, I would note that this does change the
> > > ldb module ABI (which we never promised anyway),
> >
> > I already changed the commit to be ldb-1.2.0.
> >
> > > and also is not
> > > backwards compatible with Samba < 4.7.
> >
> > Why and what to you mean exactly?
> > Building Samba 4.6 with the new ldb version?
>
> Yes.
>
> For the list archives: If you use an old Samba (or old anything,
> without a hack forcing an override) and don't rebuild, they will fail
> to operate as we have a check. If you override, then you will supply a
> too short modules structure. But you know all this.
>
> So the answer is yes, Samba 4.6 will build fine, but then fail as
> below:
>
> > > It is likely to trigger deadlock detected "failed to upgrade hash
> > > locks", LDB_ERR_BUSY on earlier Samba versions, as they came to not
> > > expect locking (and the required lock ordering) to be enforced on
> > > ldb_tdb.
> >
> > Can you give an example commit that is missing in earlier Samba
> > versions?
>
> commit 10e7c749e7b7a4155669c6ecf97a9ac52b13110a
> Author: Andrew Bartlett <abartlet at samba.org>
> Date: Wed May 31 12:22:28 2017 +1200
>
> dsdb: Correctly call ldb_module_done in dsdb_notification
>
> For the 1.1.30 ldb version, the even context changes require this in
> Samba (less important, only hits -M single):
>
> commit 7259661467776a76c4fa3aabaf1ae8a3d531e506
> Author: Andrew Bartlett <abartlet at samba.org>
> Date: Fri May 12 01:55:45 2017 +0200
>
> dsdb: Use ldb_handle_use_global_event_context for rootdse modifies
>
> The modify operations on the rootDSE turn into IRPC messages, and
> these need
> to be handled on the global event context, not the per-operation
> context
>
>
> And this probably only fails in make test and causes noise (tried to do
> transaction writes during a search):
>
> commit 5f0e53f1b90369c649688122c0a8742352f13877
> Author: Andrew Bartlett <abartlet at samba.org>
> Date: Wed May 3 22:53:14 2017 +0200
>
> dsdb: Do not write the @INDEXLIST or @ATTRIBUTES records during
> schema refresh
>
> Instead, write it once in the module init, if required, and after a
> modify to the schema partition is detected
>
>
> But this will fail in normal operation:
>
> commit b8ba0103bf45670c31384c56d6cd63bbef760a0c
> Author: Andrew Bartlett <abartlet at samba.org>
> Date: Wed May 3 22:51:30 2017 +0200
>
> dsdb: Take out the transaction and prepare_commit locks in the same
> order
>
> We must, when starting the transaction and preparing to commit the
> transaction, take
> out the locks in the same order across all the databases, otherwise
> we may deadlock.
Giving this reasonable question, I do wish I had tagged the bugs, but I
didn't. However I've filed a clear bug now, so please add
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12858
to any commits not yet in master.
> > > Naturally, this is balanced by the fact that we absolutely have to
> > > merge this, as the read and replication corruption it caused has been
> > > observed in the real world, not just our selftest.
> >
> > Yes!
> >
> > I've looked through the patches and found two things:
> >
> > 1)
> > "ldb: Add test encoding current locking behaviour during ldb_search()"
> > should include the lib/ldb/tests/ldb_mod_op_test.c changes
> > from "ldb: Lock the whole backend database for the duration of a search"
> > Or I'm I missing something?
>
> No, what I tried to do is to show what was the behaviour, then fix the
> test for the new behaviour when fixing the code.
To be clear, the idea is that the tests do pass in each commit.
> I would have done the knownfail trick but we don't have that for ldb.
>
> > 2)
> > "dsdb: Teach the Samba partition module how to lock all the DB backends"
> > got the unlocking order wrong, which must match
> > {end,del}_trans() instead of read_lock().
>
> Well spotted, I agree we want to unlock sam.ldb last.
>
> > The key is that the read locks on the individual partitions
> > are purely a performance optimization to avoid
> > fcntl syscalls, as these looks are never contented,
> > because if the "BIG-Lock" on the metadata (sam.ldb).
> >
> > I'm starting private autobuilds with the attached patchset.
>
> Thanks so much for all your careful examination of this!
>
> Do you want to tidy up your comments, squash in the unlock and push
> once you are happy with the autobuild, or is there something more you
> need from me?
>
> Thanks,
>
> Andrew Bartlett
>
--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org
Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba
More information about the samba-technical
mailing list