[WIP][PATCH] Safe LDB locking for better replication
Andrew Bartlett
abartlet at samba.org
Wed Jun 14 10:02:42 UTC 2017
On Wed, 2017-06-14 at 17:19 +1200, Andrew Bartlett via samba-technical
wrote:
> On Fri, 2017-06-09 at 09:55 +0200, Stefan Metzmacher via samba-
> technical wrote:
> >
> > I guess the wrong dsdb_replace() for the index mod_msg is one of the
> > most important bugs, which we can push and backport now?
>
> This turned out to be a red herring, but I have a patch in autoubild
> that shows that our current behaviour is correct.
>
> > Regarding the db lock around the high level search, I'd have a better
> > feeling if we would make sure that any modifications would give an
> > error
> > while a search is running.
>
> This is reinforced by the tdb layer refusing to grant a transaction
> lock while the all-record lock is held.
>
> From transaction.c:
> transaction design:
>
> - don't allow any locks to be held when a transaction starts,
> otherwise we can end up with deadlock (plus lack of lock nesting
> in posix locks would mean the lock is lost)
>
> > Similar to tdb_traverse_read() where the
> > callbacks are not allowed to do any modifications.
> >
> > I'm not sure it helps to track this on the ldb_context, as we can
> > have multiple ldb_contexts within one process which make use of the
> > same
> > low level tdb_context.
>
> I think follow, and this is why we ban nested event loops.
>
> > I'm also not sure if we have to get the read locks on all partitions,
> > wouldn't it be enough to get the lock just on the main sam.ldb file?
> > As every search and transaction would always go through this first,
> > it should be enough to have that as a guard against modifications
> > and it will reduce the processing and syscall overhead.
>
> If we don't take an all-record lock on the DB, then the traverse will
> do a walking lock over the DB, which is very slow (this work started as
> an effort to gain performance!).
Also, if we don't take the all-record lock here, we have issues with
the locking order, as one partition can lock another (resolving a
linked attribute DN) during a callback, and this could cause deadlocks
as there would not be a stable lock order.
> > Finally given that the code in lib/ldb/common/ldb.c will be
> > the only place that does the LOCK_DB and UNLOCK_DB calls,
> > I'd say it's better to add new read_lock() and read_unlock() hooks to
> > the module stack instead of using an extended operation, that would
> > also
> > simplify the code a lot.
>
> It would, I only took this hideous approach on what I understood to be
> your insistence, and will be glad to remove it and make it more like
> the transaction design. :-)
>
> > We could then have the coordination checks which make sure that the
> > following constraints are enforced:
> > 1. Only one top level search can run at a time! So the
> > top level callbacks are not allowed to do any other searches.
> > This guarantees that we just read_lock()/unlock() for one single
> > search and don't get into a possible recursion/nesting hell again.
>
> I'll have to think about this.
>
> > 2. Once a top level read_lock() is done transaction_start() would
> > fail
>
> I think we get this already.
>
> > 3. If a transaction is already started we skip the read_lock() call
> > to
> > the tdb level, but still make sure only one top level search can
> > run
> > at a time.
>
> We sort of have this already, taking out the extra locks is nested at
> the tdb layer.
>
> > One thing to watch out for are the modify triggers in the rootdse
> > module, which require the usage of the global event context and
> > operate async using IRPC. We need to make sure that we don't
> > have a tdb_transaction open while these are running.
>
> We already cancel the transaction for that reason.
>
> Thank you so very much for your thoughts, which I missed earlier.
> Attached is my work in progress, with a set of patches that is starting
> to pass tests as often as master.
>
> We seem to be finding every other flapping test along the way (I got
> lost into posixacl.py today, where it looks like things are stuck as
> ID_TYPE_GID in the cache). The good news is we found and fixed another
> replication bug today!
>
> Anyway, with that proviso, here is the current set of patches - the set
> that Garming has in autobuild and the remaining current WIP locking
> set.
>
> The current branch on catalyst git is ldb-safe-locking-single.
I'm also very pleased to report that it has passed 3/5 builds so far
with the combined patch set. I'll run more overnight. This is
actually progress, I got 1/5 successes before we fixed the schema
replication bug.
The failures I currently get are:
[446(2392)/2123 at 56m20s] samba3.raw.search(ad_dc)
Creating 700 files
UNEXPECTED(failure): samba3.raw.search.sorted(ad_dc)
REASON: Exception: Exception: ../source4/torture/raw/search.c:995:
(result.count) was 699 (0x2BB), expected 700 (0x2BC): incorrect value
and
[668(3515)/2123 at 1h27m57s] idmap.rfc2307(ad_member_rfc2307)
UNEXPECTED(failure): idmap.rfc2307.Testing for expected group
memberships(ad_member_rfc2307)
REASON: Exception: Exception:
The first looks like a garden-variety flapping test, and the second
I'll look into, but I think we have seen it before. (Note: things will
likely flap differently with different locks held).
> Further comments most welcome, and I plan to re-write it to use
> read_lock() and read_unlock() operations tomorrow.
>
Thanks,
Andrew Bartlett
--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org
Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba
More information about the samba-technical
mailing list