[WIP] TDB traverse lock changes for massive AD DC perf improvement

Andrew Bartlett abartlet at samba.org
Fri Mar 31 19:06:24 UTC 2017


On Fri, 2017-03-31 at 23:51 +1300, Andrew Bartlett via samba-technical
wrote:
> Garming recently discovered a bug in ldb_tdb, causing us not to hold
> the allrecord lock in TDB after a lock/unlock cycle.  
> 
> The upshot of that is that a tdb traverse will do many more locks,
> but
> still be protected against modification during the traverse by the
> transaction lock.
> 
> However, testing this patch showed a lock ordering issue, triggered
> EDEADLK, because if one process holds the allrecord lock, and the the
> transaction lock (for the traverse), while the other process holds
> the
> transaction lock (from a prepare commit), and tries to get the
> allrecord lock (the finish the prepare commit), it will report a
> deadlock in the prepare commit. 
> 
> We tried to change the ltdb code to simply obtain a transaction lock
> first, but the tdb transaction lock is not exposed in an API.  
> 
> However, by swapping the traverse to the allrecord lock, we remove
> the
> second lock that is the heart of the ordering issue.
> 
> I realise that while unimportant to the AD DC case (all modifies in a
> transaction), I have traded off concurrency for performance.  Can
> someone explain why tdb_traverse_read() needs a lock at all?

Here is the commit, but I still don't understand the reasoning:

commit 251aaafe3a9213118ac3a92def9ab2104c40d12a
Author: Andrew Tridgell <tridge at samba.org>
Date:   Tue Sep 27 01:26:34 2005 +0000

    r10522: finally got the locking working on solaris10. This adds a
read lock on
    the transaction lock in tdb_traverse_read(). This prevents a
pattern
    of locks which triggers the deadlock detection code in solaris10. I
    suspect solaris10 is trying to prevent lock starvation by granting
    locks in the order they were requested, which makes it much easier
to
    produce deadlocks.
    (This used to be commit 54203aacd138c30826d54c5d9b6cc8d6e9e270f8)

diff --git a/source4/lib/tdb/common/traverse.c
b/source4/lib/tdb/common/traverse.c
index 9271b75..00fe5be 100644
--- a/source4/lib/tdb/common/traverse.c
+++ b/source4/lib/tdb/common/traverse.c
@@ -205,9 +205,21 @@ int tdb_traverse_read(struct tdb_context *tdb,
 {
        struct tdb_traverse_lock tl = { NULL, 0, 0, F_RDLCK };
        int ret;
+       
+       /* we need to get a read lock on the transaction lock here to
+          cope with the lock ordering semantics of solaris10 */
+       if (tdb->methods->tdb_brlock(tdb, TRANSACTION_LOCK, F_RDLCK,
F_SETLKW, 0) == -1) {
+               TDB_LOG((tdb, 0, "tdb_traverse_read: failed to get
transaction lock\n"));
+               tdb->ecode = TDB_ERR_LOCK;
+               return -1;
+       }
+
        tdb->traverse_read++;
        ret = tdb_traverse_internal(tdb, fn, private, &tl);
        tdb->traverse_read--;
+
+       tdb->methods->tdb_brlock(tdb, TRANSACTION_LOCK, F_UNLCK,
F_SETLKW, 0);
+
        return ret;
 }
 

> Attached is the WIP patch and test modifications, as well as a PDF of
> the performance improvement. 

If we could understand better the solaris10 issue, we could potentially
remove this lock in tdb_traverse_read() as a better approach, and use
the allrecord lock for a tdb_traverse().  That would provide the
original concurrency for tdb_traverse_read(). 

> The branch is ldap-multi-process in the the Catalyst GIT server. 
> 
> Thoughts most welcome.

My final thought is that I can't rationalise how the tdb locking model
and the async ldb stack can operate safely together.  

My instinct is that we need to create a new event context for reads,
just as we do for all transactions.  That is, starting to (as intended)
actually hold the allrecord lock while potentially calling
tevent_loop() via ldb_wait() feels totally unsafe. 

Thanks,

Andrew Bartlett
-- 
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba




More information about the samba-technical mailing list