[WIP][PATCH] Safe LDB locking for better replication

Andrew Bartlett abartlet at samba.org
Wed Jun 14 10:02:42 UTC 2017


On Wed, 2017-06-14 at 17:19 +1200, Andrew Bartlett via samba-technical
wrote:
> On Fri, 2017-06-09 at 09:55 +0200, Stefan Metzmacher via samba-
> technical wrote:
> > 
> > I guess the wrong dsdb_replace() for the index mod_msg is one of the
> > most important bugs, which we can push and backport now?
> 
> This turned out to be a red herring, but I have a patch in autoubild
> that shows that our current behaviour is correct. 
> 
> > Regarding the db lock around the high level search, I'd have a better
> > feeling if we would make sure that any modifications would give an
> > error
> > while a search is running.
> 
> This is reinforced by the tdb layer refusing to grant a transaction
> lock while the all-record lock is held.  
> 
> From transaction.c:
>  transaction design:
> 
>   - don't allow any locks to be held when a transaction starts,
>     otherwise we can end up with deadlock (plus lack of lock nesting
>     in posix locks would mean the lock is lost)
> 
> > Similar to tdb_traverse_read() where the
> > callbacks are not allowed to do any modifications.
> > 
> > I'm not sure it helps to track this on the ldb_context, as we can
> > have multiple ldb_contexts within one process which make use of the
> > same
> > low level tdb_context.
> 
> I think follow, and this is why we ban nested event loops. 
> 
> > I'm also not sure if we have to get the read locks on all partitions,
> > wouldn't it be enough to get the lock just on the main sam.ldb file?
> > As every search and transaction would always go through this first,
> > it should be enough to have that as a guard against modifications
> > and it will reduce the processing and syscall overhead.
> 
> If we don't take an all-record lock on the DB, then the traverse will
> do a walking lock over the DB, which is very slow (this work started as
> an effort to gain performance!). 

Also, if we don't take the all-record lock here, we have issues with
the locking order, as one partition can lock another (resolving a
linked attribute DN) during a callback, and this could cause deadlocks
as there would not be a stable lock order.

> > Finally given that the code in lib/ldb/common/ldb.c will be
> > the only place that does the LOCK_DB and UNLOCK_DB calls,
> > I'd say it's better to add new read_lock() and read_unlock() hooks to
> > the module stack instead of using an extended operation, that would
> > also
> > simplify the code a lot.
> 
> It would, I only took this hideous approach on what I understood to be
> your insistence, and will be glad to remove it and make it more like
> the transaction design. :-)
> 
> > We could then have the coordination checks which make sure that the
> > following constraints are enforced:
> > 1. Only one top level search can run at a time! So the
> >    top level callbacks are not allowed to do any other searches.
> >    This guarantees that we just read_lock()/unlock() for one single
> >    search and don't get into a possible recursion/nesting hell again.
> 
> I'll have to think about this.
> 
> > 2. Once a top level read_lock() is done transaction_start() would
> > fail
> 
> I think we get this already.
> 
> > 3. If a transaction is already started we skip the read_lock() call
> > to
> >    the tdb level, but still make sure only one top level search can
> > run
> >    at a time.
> 
> We sort of have this already, taking out the extra locks is nested at
> the tdb layer. 
> 
> > One thing to watch out for are the modify triggers in the rootdse
> > module, which require the usage of the global event context and
> > operate async using IRPC. We need to make sure that we don't
> > have a tdb_transaction open while these are running.
> 
> We already cancel the transaction for that reason.
> 
> Thank you so very much for your thoughts, which I missed earlier. 
> Attached is my work in progress, with a set of patches that is starting
> to pass tests as often as master. 
> 
> We seem to be finding every other flapping test along the way (I got
> lost into posixacl.py today, where it looks like things are stuck as
> ID_TYPE_GID in the cache).  The good news is we found and fixed another
> replication bug today!
> 
> Anyway, with that proviso, here is the current set of patches - the set
> that Garming has in autobuild and the remaining current WIP locking
> set.
> 
> The current branch on catalyst git is ldb-safe-locking-single.

I'm also very pleased to report that it has passed 3/5 builds so far
with the combined patch set.  I'll run more overnight.  This is
actually progress, I got 1/5 successes before we fixed the schema
replication bug.

The failures I currently get are:

[446(2392)/2123 at 56m20s] samba3.raw.search(ad_dc)
Creating 700 files
UNEXPECTED(failure): samba3.raw.search.sorted(ad_dc)
REASON: Exception: Exception: ../source4/torture/raw/search.c:995:
(result.count) was 699 (0x2BB), expected 700 (0x2BC): incorrect value

and

[668(3515)/2123 at 1h27m57s] idmap.rfc2307(ad_member_rfc2307)
UNEXPECTED(failure): idmap.rfc2307.Testing for expected group
memberships(ad_member_rfc2307)
REASON: Exception: Exception:

The first looks like a garden-variety flapping test, and the second
I'll look into, but I think we have seen it before.  (Note: things will
likely flap differently with different locks held). 

> Further comments most welcome, and I plan to re-write it to use
> read_lock() and read_unlock() operations tomorrow. 
> 
Thanks,

Andrew Bartlett

-- 
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba




More information about the samba-technical mailing list