ldb speed

Mon Oct 30 04:48:18 GMT 2006

On Mon, 2006-10-30 at 14:16 +1100, tridge at samba.org wrote:
> Simo,
> 
> As we've discussed previously, ldb is currently way too slow. The
> LOCAL-DBSPEED test is a simple way to demonstrate this - it tests both
> ldb and tdb for the example of a sidmap style database (something like
> what winbind_idmap.tdb currently holds). 
> 
> In LOCAL-DBSPEED compiled with optimisation, each ldb search takes
> about 45 usec on my test machine. The equivalent tdb fetch takes about
> 8 usec. It's even worse if you factor out the system overhead of
> locking. With locking disabled a tdb fetch takes about 3 usec, whereas
> a ldb search takes around 40 usec.
> 
> So we're losing about 37 usec per search on
> packing/unpacking/validation/dns etc in ldb. More specifically the
> breakdown is roughly:
> 
> 	7 usec/op in ldb_dn_explode
> 	8 usec/op in ldb_dn_linearize
> 	6 usec/op in tdb_fetch
> 	3 usec/op in ltdb_unpack_data
> 	9 usec/op in ldb_dn_string_compose
> 	7 usec ltdb_lock_read/ltdb_unlock_read
> 
> I think we can eliminate most of this overhead with a bit of work.
> 
> The most obvious place to start is with the DN manipulation code. If
> you look at the work done in something like ldb_dn_explode() then you
> should see that most of the work is not needed most of the time. The
> reason is that it spends an awful lot of time and effort carefully
> validating the input it has, carefully checking that attribute names
> are valid, parsing potentially escaped strings etc.

Yeah, indeed the work on ldb_dn was completely driven by the need to
validate a DN. I see we can make the process smarter in some cases.

> This work is arguably appropriate for DNs given to us by external
> users of the ldb API when we are going to store that DN in our
> database, but it is not needed at all for a ldb_search() call, and is
> certainly not needed when parsing a DN that is stored in our
> underlying tdb.

Except that we need to validate the DN and return an appropriate error
in some cases even for searches, but I guess we can reach the same
result by other means, the most important thing is however to properly
case change the DN to do the right search, and that requires parsing.

> Say that a user gave us a DN in a ldb_search() call with a invalid
> atribute name as part of one component. The search will not find any
> results, as our database will not contain any records with a DN that
> matches, so carefully checking the DN components for validity just
> isn't necessary.

Except for case-sensitivity considerations this make sense.

> Similarly, when parsing a DN from our tdb, that DN could only have
> been stored in that tdb via a add operation or a DN rename operation,
> so it is only necessary to validate the DN on an add or
> rename. Validating it on read is pointless.

This is completely true, I see we can easily skip any check on read.

> Next, we should think about the format of a DN. Originally in ldb a DN
> was a char* string. This format was used both in the on-disk format
> and in the API. It meant we never had to convert a DN from one format
> to another. 

Yes, this was a very hard decision to take. At the moment validation
seemed to be more important. Now that we know what is needed I guess we
can optimize it where needed.

> As I understand it, the change to the ldb_dn structure format was
> needed to allow us to correctly handle some of the semantics of
> ldap. Could we support all those semantics with a char* format? I
> think we could.

Everything can be done :)
I just would like to make it sure we do not trade some slow path with
another :)

> There is a lot more room for optimisation in other parts of ldb, but
> fixing the current struct ldb_dn is the main thing. I'd suggest we try
> the following strategy to start with.
> 
>  1) make 'struct ldb_dn' a private structure to ldb_dn.c. Keep it in
>     the API for now, but make it an opaque structure as far as the
>     rest of our code is concerned. That will involve adding some
>     public routines to extract properties of a 'struct ldb_dn' to
>     allow us to fix up the (very few!) external bits of code that
>     currently look inside a struct ldb_dn

I think this is a very good idea, and I was thinking right today to make
other structures opaque as well and provide more manipulation functions.
This will help keeping LDB consistent in the long term by denying users
to poke with internal structures and make simpler to stay ABI compatible
between releases.

>  2) once we've made it an internal only structure, we can start
>     changing its shape. It could become as simple as a 'char*'
>     internally, or could retain some flags, or even a broken out array
>     that gets created on demand.

I think that as a start we can just make 2 different functions (with
validation and without) and see how much we gain just with this.
Then dig further as we see fit.

>  3) make sure we differentiate between callers where input needs to be
>     validated and callers where it does not need validation.

I'd say, anything that comes from outside needs validation, anything
from inside doesn't.
This can be easily done by making the non-validation function private
and see how much this helps.

> Sound OK?

A promising start :-)

Simo.

-- 
Simo Sorce
Samba Team GPL Compliance Officer
email: idra at samba.org
http://samba.org