Fwd: Regression: ldb performance with indexes

Andreas Schneider asn at samba.org
Wed May 8 13:57:19 UTC 2024


On Friday, 3 May 2024 19:20:28 GMT+2 Andreas Schneider via samba-technical 
wrote:
> On Thursday, 2 May 2024 22:51:31 GMT+2 Andrew Bartlett via samba-technical
> 
> wrote:
> > On Thu, 2024-05-02 at 14:49 +0200, Andreas Schneider via samba-
> > 
> > technical wrote:
> > > On Friday, 22 March 2024 17:55:23 GMT+2 Andréas LEROUX via samba-
> > > 
> > > technical wrote:
> > > > Hi Andreas and Andrew,
> > > > 
> > > >  >>>> > Hi,my colleagues discovered a performance issue in libldb:
> > > > >>>> > https://bugzilla.samba.org/show_bug.cgi?id=15590
> > > > >>>> > 
> > > >  >>>> >  >>>> > > > > As soon as you use indexes, ldbadd will be
> > > > 
> > > > magnitudes >>  >> slower than >>  >>>> > itwas before.Could some
> > > > ldb expert please look into it? >>>> >  >>>> > > Your subject says
> > > > a regression. What version is this a >>>>  >>>> regressionagainst?
> > > > 
> > > > >>>> Isn't that obvious from the bug report? >>>  >>> Here is the
> > > > 
> > > > short summary: >>> $ bash repro.sh 20000 indexesAdded 2 records
> > > > successfullyAdded >>  >> 20000 >>  >>> records successfully >>> On
> > > > Samba 4.10: 0m01.231sOn Samba 4.19: 1m30.924s (that's 90 times >>>
> > > > slower) >>>  >>>> > The very nature of a DB index is that it will
> > > > take time to >>>>  >>>> create,possibly a lot of time, but should
> > > > make reads faster. >>>> Either the DB index doesn't work at all in
> > > > Samba 4.10 or there >>  >> is a >>  >>> huge performance problem in
> > > > Samba 4.19. What is it? >>  >> Thanks, that wasn't written as
> > > > obviously on the bug, thanks for the >> clarification. >  > I used
> > > > our CentOS 8 Stream CI image for bisecting. You can't bisect >
> > > > easily on a modern Linux Distribution, as the included waf would
> > > > not > have support for newer Python versions like 3.12.
> > > > 
> > > >  > In case you want to reproduce it, here is my run:I'm Andréas
> > > > 
> > > > from Tranquil IT dev team. Denis and Yohannès asked me thisweek to
> > > > take a look at the performance issues on large domains,
> > > > whichinclude this issue in the current thread along the mdb large
> > > > transactionissues.
> > > > The attached patchset goes through all the tdb and ldb make test.
> > > > * LMDB : increase MDB_IDL_LOGN from 16 to 18 to accomodate large
> > > > nestedtransactions* tdb : fail-fast when record hash doesn't match
> > > > expected hash to avoidto read/copy the entire record* ldb :
> > > > increase DEFAULT_INDEX_CACHE_SIZE from 491 to 8089 to increasethe
> > > > number of bucket to have smaller bucket to have faster iteration
> > > > ineach buckets in tdb_find
> > > > With this patchset we can upgrade large domains (>200k objects)
> > > > toFL2k16 level in approximatly 1 hour instead of 3 days :-)
> > > > [root at srvads1-bl1cw ~]# bash repro.sh 20000 indexes Added 2
> > > > recordssuccessfully Added 20000 records successfully real 0m0.536s
> > > > user0m0.798s sys 0m0.105s
> > > 
> > > I'm sorry but I'm not able to reproduce this:
> > > 
> > > tis-tdbfind.patch:
> > > bash repro_dev_ldb.sh 10000 indexesAdded 2 records successfullyAdded
> > > 10000 records successfully
> > > real    0m9.035suser    0m9.021ssys     0m0.283s
> > > 
> > > tis-ldbfind.patch:
> > > bash repro_dev_ldb.sh 10000 indexesAdded 2 records successfullyAdded
> > > 10000 records successfully
> > > real    0m8.929suser    0m8.980ssys     0m0.219s
> > > 
> > > 
> > > I have a patch in the area to get rid of some malloc calls, but the
> > > only give a really small improvement.
> > > 
> > > I don't know what workflow your patches exactly improve but it would
> > > be nice to have a reproducer :-)
> > 
> > Just a quick note to connect some threads.  We have three discussions
> > on this same issue, we should probably centralise here as this is where
> > things started, but just so folks can follow, see:
> > https://bugzilla.samba.org/show_bug.cgi?id=15590https://gitlab.com/samba-t
> > ea m/samba/-/merge_requests/3616 In short, the emerging consensus is that
> > we really need is a better data structure than an in-memory TDB for the
> > in-memory cache needed to keep the indexes lined up with the database in
> > this case.
> 
> From https://gitlab.com/samba-team/samba/-/merge_requests/3616
> 
> The in-memory TDB is probably the wrong usage here and a red-black tree
> might be the solution.
> 
> There is lib/dbwrap/dbwrap_rbt.h. From the API it should be straight forward
> and quick to replace the tdb API in lib/ldb/ldb_key_value/ldb_kv_index.c
> with it just for testing.
> If it fixes it, we should try to make lib/util/rbtree.c a SUBSYSTEM and link
> it into libldb. As libldb is not standalone anymore, this should be doable.

Using a red black tree doesn't solve the issue:

$ bash repro_dev_ldb.sh 10000 indexes
RED BLACK TREE
RED BLACK TREE
RED BLACK TREE
Added 2 records successfully
RED BLACK TREE
Added 10000 records successfully

real    0m9.299s
user    0m9.212s
sys     0m0.263s

https://git.samba.org/?p=asn/samba.git;a=shortlog;h=refs/heads/asn-ldb



-- 
Andreas Schneider                      asn at samba.org
Samba Team                             www.samba.org
GPG-ID:     8DFF53E18F2ABC8D8F3C92237EE0FC4DCC014E3D





More information about the samba-technical mailing list