ldb speed

Mon Oct 30 09:26:51 GMT 2006

Luke,

 > Another thing is that at some point the data store layer needs to have some
 > degree of schema-awareness

it does, though indirectly. 

 > For example, a common myth is that DNs are case-insensitive;

yes, we handle this fine. DN comparison, indexing etc all takes this
into account already.

 > in fact, it depends on the matching rules associated with the
 > naming attributes in each RDN. So to accurately compare a DN you
 > need schema-awareness. Also, whilst in LDAP everything is a (octet)
 > string, it doesn't always make sense to use strings as the
 > underlying representation, although the desire to keep ldb
 > databases hand-editable might influence this.

the format on disk was never hand-editable. The 'ldbedit' tool has
nothing to do with the disk format. In fact, ldbedit works fine with
any LDAP server or a local openldap backend :-)

 > Also, how do you deal with attributes that contain references? The classic
 > case is an attribute with syntax distinguishedName (like a group member),
 > but AD also has several syntaxes that can contain embedded DNs, such as
 > dnWithString, dnWithOctetString, and orName. You need to ensure that when
 > an entry is renamed that the references are updated.

yes, we can handle this type of thing already. We don't handle them
all yet, but there is nothing in the structure or format that makes it
hard.

My comments in this thread are all about optimising the common paths,
while retaining speed. In this instance I'm particularly intesested in
the case where ldb would replace tdb, not where ldb would replace
ldap. We already have ldb much faster then openldap (at least last
time I tested) for the cases we care about, but for ldb to replace tdb
for our local meta data stores we have to make it much faster again. A
search on a tdb database is an _extremely_ efficient operation (a hash
plus a couple of memory dereferences in mmap'd memory), and I want to
retain that efficiency in the common paths in ldb. We've lost some of
that efficiency with recent efforts to gain more ldap compliance, and
I think we can regain it while retaining the ldap semantics.

Nobody would ever seriously consider a 'real' ldap database for the
sort of test case I am looking at here. It would cost a lot more than
50 usecs just to send the request over a pipe (assuming it was on
loopback or ldapi) or even more on gigabit. Actually answering the
query in a traditional ldap database would be horribly slow. That is
the sort of case I want to make work in ldb in around 10 usec or so.

I just did a quick test against openldap+bdb over ldapi for this sort
of query, and it took about 500 usec per query. So we are already
about 10x better than that, and I'm aiming for another factor of 5x to
bring us within a reasonable distance of tdb.

 > Hope this helps. I'm no database guru by any means but I have seen a lot
 > of directory servers over the years :-)

Your comments are most welcome! In this particular case we are not
trying to optimise the same sort of thing as a traditional database,
but your hints and contributions are most welcome anyway.

Cheers, Tridge