[WIP] TDB traverse lock changes for massive AD DC perf improvement

Fri Mar 31 20:05:48 UTC 2017

On Sat, 2017-04-01 at 08:06 +1300, Andrew Bartlett via samba-technical
wrote:
> On Fri, 2017-03-31 at 23:51 +1300, Andrew Bartlett via samba-
> technical
> wrote:
> > Garming recently discovered a bug in ldb_tdb, causing us not to
> > hold
> > the allrecord lock in TDB after a lock/unlock cycle.  
> > 
> > The upshot of that is that a tdb traverse will do many more locks,
> > but
> > still be protected against modification during the traverse by the
> > transaction lock.
> > 
> > However, testing this patch showed a lock ordering issue, triggered
> > EDEADLK, because if one process holds the allrecord lock, and the
> > the
> > transaction lock (for the traverse), while the other process holds
> > the
> > transaction lock (from a prepare commit), and tries to get the
> > allrecord lock (the finish the prepare commit), it will report a
> > deadlock in the prepare commit. 
> > 
> > We tried to change the ltdb code to simply obtain a transaction
> > lock
> > first, but the tdb transaction lock is not exposed in an API.  
> > 
> > However, by swapping the traverse to the allrecord lock, we remove
> > the
> > second lock that is the heart of the ordering issue.

I've found the original intent for the transaction and traverse
behaviour documented in:

commit 7dd31288a701d772e45b1960ac4ce4cc1be782ed
Author: Andrew Tridgell <tridge at samba.org>
Date:   Thu Sep 22 13:12:46 2005 +0000

    r10421: following on discussions with simo, I have worked out a way
of
    allowing searches to proceed while another process is in a
    transaction, then only upgrading the transaction lock to a write
lock
    on commit.

    The solution is:

     - split tdb_traverse() into two calls, called tdb_traverse() and
       tdb_traverse_read(). The _read() version only gets read locks,
and
       will fail any write operations made in the callback from the
       traverse.

     - the normal tdb_traverse() call allows for read or write
operations
       in the callback, but gets the transaction lock, preventing
       transastions from starting inside the traverse

    In addition we enforce the following rule that you may not start a
    transaction within a traverse callback, although you can start a
    traverse within a transaction

    With these rules in place I believe all the deadlock possibilities
are
    removed, and we can now allow for searches to happen in parallel
with
    transactions

Following that, I've essentially reverted this:

> > I realise that while unimportant to the AD DC case (all modifies in
> > a
> > transaction), I have traded off concurrency for performance.  Can
> > someone explain why tdb_traverse_read() needs a lock at all?
> 
> Here is the commit, but I still don't understand the reasoning:
> 
> commit 251aaafe3a9213118ac3a92def9ab2104c40d12a
> Author: Andrew Tridgell <tridge at samba.org>
> Date:   Tue Sep 27 01:26:34 2005 +0000
> 
>     r10522: finally got the locking working on solaris10. This adds a
> read lock on
>     the transaction lock in tdb_traverse_read(). This prevents a
> pattern
>     of locks which triggers the deadlock detection code in solaris10.
> I
>     suspect solaris10 is trying to prevent lock starvation by
> granting
>     locks in the order they were requested, which makes it much
> easier
> to
>     produce deadlocks.
>     (This used to be commit 54203aacd138c30826d54c5d9b6cc8d6e9e270f8)
> 
> diff --git a/source4/lib/tdb/common/traverse.c
> b/source4/lib/tdb/common/traverse.c
> index 9271b75..00fe5be 100644
> --- a/source4/lib/tdb/common/traverse.c
> +++ b/source4/lib/tdb/common/traverse.c
> @@ -205,9 +205,21 @@ int tdb_traverse_read(struct tdb_context *tdb,
>  {
>         struct tdb_traverse_lock tl = { NULL, 0, 0, F_RDLCK };
>         int ret;
> +       
> +       /* we need to get a read lock on the transaction lock here to
> +          cope with the lock ordering semantics of solaris10 */
> +       if (tdb->methods->tdb_brlock(tdb, TRANSACTION_LOCK, F_RDLCK,
> F_SETLKW, 0) == -1) {
> +               TDB_LOG((tdb, 0, "tdb_traverse_read: failed to get
> transaction lock\n"));
> +               tdb->ecode = TDB_ERR_LOCK;
> +               return -1;
> +       }
> +
>         tdb->traverse_read++;
>         ret = tdb_traverse_internal(tdb, fn, private, &tl);
>         tdb->traverse_read--;
> +
> +       tdb->methods->tdb_brlock(tdb, TRANSACTION_LOCK, F_UNLCK,
> F_SETLKW, 0);
> +
>         return ret;
>  }
>  
> 
> > Attached is the WIP patch and test modifications, as well as a PDF
> > of
> > the performance improvement. 
> 
> If we could understand better the solaris10 issue, we could
> potentially
> remove this lock in tdb_traverse_read() as a better approach, and use
> the allrecord lock for a tdb_traverse().  That would provide the
> original concurrency for tdb_traverse_read(). 

I've now done that.  This feels closer to the correct solution, but
throws solaris10 under the bus for now.  See the attached patch.

> > The branch is ldap-multi-process in the the Catalyst GIT server. 

And this set with just the ldb and tdb locking is in ldb-tdb-locking. 
I'm running the perf graphs again to see how just this isolated set
performs. 

> > Thoughts most welcome.
> 
> My final thought is that I can't rationalise how the tdb locking
> model
> and the async ldb stack can operate safely together.  
> 
> My instinct is that we need to create a new event context for reads,
> just as we do for all transactions.  That is, starting to (as
> intended)
> actually hold the allrecord lock while potentially calling
> tevent_loop() via ldb_wait() feels totally unsafe. 

Thanks,

Andrew Bartlett
-- 
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ldb-tdb-locking.patch
Type: text/x-patch
Size: 11398 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20170401/c69270ac/ldb-tdb-locking.bin>