Fwd: Regression: ldb performance with indexes

Thu May 2 12:49:29 UTC 2024

On Friday, 22 March 2024 17:55:23 GMT+2 Andréas LEROUX via samba-technical 
wrote:
> Hi Andreas and Andrew,
> 
>  >>>> > Hi,my colleagues discovered a performance issue in libldb:
>  >>>> > https://bugzilla.samba.org/show_bug.cgi?id=15590
>  >>>> > 
>  >>>> > > > > As soon as you use indexes, ldbadd will be magnitudes
>  >> 
>  >> slower than
>  >> 
>  >>>> > itwas before.Could some ldb expert please look into it?
>  >>>> > 
>  >>>> > > Your subject says a regression. What version is this a
>  >>>> 
>  >>>> regressionagainst?
>  >>>> Isn't that obvious from the bug report?
>  >>> 
>  >>> Here is the short summary:
>  >>> $ bash repro.sh 20000 indexesAdded 2 records successfullyAdded
>  >> 
>  >> 20000
>  >> 
>  >>> records successfully
>  >>> On Samba 4.10: 0m01.231sOn Samba 4.19: 1m30.924s (that's 90 times
>  >>> slower)
>  >>> 
>  >>>> > The very nature of a DB index is that it will take time to
>  >>>> 
>  >>>> create,possibly a lot of time, but should make reads faster.
>  >>>> Either the DB index doesn't work at all in Samba 4.10 or there
>  >> 
>  >> is a
>  >> 
>  >>> huge performance problem in Samba 4.19. What is it?
>  >> 
>  >> Thanks, that wasn't written as obviously on the bug, thanks for the
>  >> clarification.
>  > 
>  > I used our CentOS 8 Stream CI image for bisecting. You can't bisect
>  > easily on a modern Linux Distribution, as the included waf would not
>  > have support for newer Python versions like 3.12.
> 
>  > In case you want to reproduce it, here is my run:
> I'm Andréas from Tranquil IT dev team. Denis and Yohannès asked me this
> week to take a look at the performance issues on large domains, which
> include this issue in the current thread along the mdb large transaction
> issues.
> 
> The attached patchset goes through all the tdb and ldb make test.
> 
> * LMDB : increase MDB_IDL_LOGN from 16 to 18 to accomodate large nested
> transactions
> * tdb : fail-fast when record hash doesn't match expected hash to avoid
> to read/copy the entire record
> * ldb : increase DEFAULT_INDEX_CACHE_SIZE from 491 to 8089 to increase
> the number of bucket to have smaller bucket to have faster iteration in
> each buckets in tdb_find
> 
> With this patchset we can upgrade large domains (>200k objects) to
> FL2k16 level in approximatly 1 hour instead of 3 days :-)
> 
> [root at srvads1-bl1cw ~]# bash repro.sh 20000 indexes Added 2 records
> successfully Added 20000 records successfully real 0m0.536s user
> 0m0.798s sys 0m0.105s

I'm sorry but I'm not able to reproduce this:

tis-tdbfind.patch:

bash repro_dev_ldb.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m9.035s
user    0m9.021s
sys     0m0.283s

tis-ldbfind.patch:

bash repro_dev_ldb.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m8.929s
user    0m8.980s
sys     0m0.219s

I have a patch in the area to get rid of some malloc calls, but the only give 
a really small improvement.

I don't know what workflow your patches exactly improve but it would be nice 
to have a reproducer :-)

Best regards

	Andreas

-- 
Andreas Schneider                      asn at samba.org
Samba Team                             www.samba.org
GPG-ID:     8DFF53E18F2ABC8D8F3C92237EE0FC4DCC014E3D