Fwd: Regression: ldb performance with indexes

Andreas Schneider asn at samba.org
Fri May 3 17:20:28 UTC 2024


On Thursday, 2 May 2024 22:51:31 GMT+2 Andrew Bartlett via samba-technical 
wrote:
> On Thu, 2024-05-02 at 14:49 +0200, Andreas Schneider via samba-
> 
> technical wrote:
> > On Friday, 22 March 2024 17:55:23 GMT+2 Andréas LEROUX via samba-
> > 
> > technical wrote:
> > > Hi Andreas and Andrew,
> > > 
> > >  >>>> > Hi,my colleagues discovered a performance issue in libldb:
> > > >>>> > https://bugzilla.samba.org/show_bug.cgi?id=15590
> > > >>>> > 
> > >  >>>> >  >>>> > > > > As soon as you use indexes, ldbadd will be
> > > 
> > > magnitudes >>  >> slower than >>  >>>> > itwas before.Could some
> > > ldb expert please look into it? >>>> >  >>>> > > Your subject says
> > > a regression. What version is this a >>>>  >>>> regressionagainst?
> > > 
> > > >>>> Isn't that obvious from the bug report? >>>  >>> Here is the
> > > 
> > > short summary: >>> $ bash repro.sh 20000 indexesAdded 2 records
> > > successfullyAdded >>  >> 20000 >>  >>> records successfully >>> On
> > > Samba 4.10: 0m01.231sOn Samba 4.19: 1m30.924s (that's 90 times >>>
> > > slower) >>>  >>>> > The very nature of a DB index is that it will
> > > take time to >>>>  >>>> create,possibly a lot of time, but should
> > > make reads faster. >>>> Either the DB index doesn't work at all in
> > > Samba 4.10 or there >>  >> is a >>  >>> huge performance problem in
> > > Samba 4.19. What is it? >>  >> Thanks, that wasn't written as
> > > obviously on the bug, thanks for the >> clarification. >  > I used
> > > our CentOS 8 Stream CI image for bisecting. You can't bisect >
> > > easily on a modern Linux Distribution, as the included waf would
> > > not > have support for newer Python versions like 3.12.
> > > 
> > >  > In case you want to reproduce it, here is my run:I'm Andréas
> > > 
> > > from Tranquil IT dev team. Denis and Yohannès asked me thisweek to
> > > take a look at the performance issues on large domains,
> > > whichinclude this issue in the current thread along the mdb large
> > > transactionissues.
> > > The attached patchset goes through all the tdb and ldb make test.
> > > * LMDB : increase MDB_IDL_LOGN from 16 to 18 to accomodate large
> > > nestedtransactions* tdb : fail-fast when record hash doesn't match
> > > expected hash to avoidto read/copy the entire record* ldb :
> > > increase DEFAULT_INDEX_CACHE_SIZE from 491 to 8089 to increasethe
> > > number of bucket to have smaller bucket to have faster iteration
> > > ineach buckets in tdb_find
> > > With this patchset we can upgrade large domains (>200k objects)
> > > toFL2k16 level in approximatly 1 hour instead of 3 days :-)
> > > [root at srvads1-bl1cw ~]# bash repro.sh 20000 indexes Added 2
> > > recordssuccessfully Added 20000 records successfully real 0m0.536s
> > > user0m0.798s sys 0m0.105s
> > 
> > I'm sorry but I'm not able to reproduce this:
> > 
> > tis-tdbfind.patch:
> > bash repro_dev_ldb.sh 10000 indexesAdded 2 records successfullyAdded
> > 10000 records successfully
> > real    0m9.035suser    0m9.021ssys     0m0.283s
> > 
> > tis-ldbfind.patch:
> > bash repro_dev_ldb.sh 10000 indexesAdded 2 records successfullyAdded
> > 10000 records successfully
> > real    0m8.929suser    0m8.980ssys     0m0.219s
> > 
> > 
> > I have a patch in the area to get rid of some malloc calls, but the
> > only give a really small improvement.
> > 
> > I don't know what workflow your patches exactly improve but it would
> > be nice to have a reproducer :-)
> 
> Just a quick note to connect some threads.  We have three discussions
> on this same issue, we should probably centralise here as this is where
> things started, but just so folks can follow, see:
> https://bugzilla.samba.org/show_bug.cgi?id=15590https://gitlab.com/samba-tea
> m/samba/-/merge_requests/3616 In short, the emerging consensus is that we
> really need is a better data structure than an in-memory TDB for the
> in-memory cache needed to keep the indexes lined up with the database in
> this case.

From https://gitlab.com/samba-team/samba/-/merge_requests/3616

The in-memory TDB is probably the wrong usage here and a red-black tree might 
be the solution.

There is lib/dbwrap/dbwrap_rbt.h. From the API it should be straight forward 
and quick to replace the tdb API in lib/ldb/ldb_key_value/ldb_kv_index.c with 
it just for testing.
If it fixes it, we should try to make lib/util/rbtree.c a SUBSYSTEM and link 
it into libldb. As libldb is not standalone anymore, this should be doable.



	Andreas


-- 
Andreas Schneider                      asn at samba.org
Samba Team                             www.samba.org
GPG-ID:     8DFF53E18F2ABC8D8F3C92237EE0FC4DCC014E3D





More information about the samba-technical mailing list