ldb speed

Mon Oct 30 04:59:42 GMT 2006

On Mon, 2006-10-30 at 14:46 +1100, Luke Howard wrote:
> A couple of other things you could do (if you don't already):
> 
> (a) just store the RDN (relative distinguished name). This makes searches
> potentially more expensive but makes subtree renames very cheap. As Ed
> Reed once said (probably not in these words), who would use a filesystem
> in which the complete path was stored with every file?
> 
> (b) if you are going to store the DN, store a normalized DN (for matching)
> as well as the preserved DN (for reading).
> 
> In OpenLDAP, back-hdb does (a), back-bdb (b). (Actually, back-hdb would do
> (b) for the RDN too I expect.)

Honestly, to make things easy we should store by GUID, because otherwise
in handling renames you need a global lock and traverse each object to
make sure there aren't references to the renamed object.
Actually we do not check anything but sooner or later we need.

At the same time storing by DN is a big performance advantage. So in the
end I think that being able to have 2 keys for the same record could be
the best way (GUID and DN) so that references will be stored by GUID
(and searches for the GUID will be equally fast) but all other
operations will use the normalized DN as key.

> Another thing is that at some point the data store layer needs to have some
> degree of schema-awareness. I'm not sure how much you have today (please
> tell me to shut up if I should just go and look at the code!).

Shut-up and look at the code! :-)

> For example, a common myth is that DNs are case-insensitive; in fact, it
> depends on the matching rules associated with the naming attributes in each
> RDN. So to accurately compare a DN you need schema-awareness. Also, whilst
> in LDAP everything is a (octet) string, it doesn't always make sense to use
> strings as the underlying representation, although the desire to keep ldb
> databases hand-editable might influence this.

Yes we have some degree of schema awareness already, and that's exactly
why our parsing of ldb_dn, it does also that (consider
case-sensitivity). The plan is to make it possible to plug the schema
module so that these functions can use it, and not use just our basic
internal one (which I'd like to remove in the end).

> Also, how do you deal with attributes that contain references? The classic
> case is an attribute with syntax distinguishedName (like a group member),
> but AD also has several syntaxes that can contain embedded DNs, such as
> dnWithString, dnWithOctetString, and orName. You need to ensure that when
> an entry is renamed that the references are updated.

Yes, this is one of the problems I am considering lately, the problem is
that to address it properly we may need to change the wire format and I
am not sure that's desirable.

>  A common approach is
> to store the reference as an entry ID or GUID (this buys you referential
> integrity at a small performance cost, but is easily optimized). But then
> in a distributed system it gets more complex, because you can't generally
> afford a network query at dereference time, nor would you want that sort
> of code inside the database layer. This is one of the more interesting
> problems in building a distributed directory.

Yes cross-site referencing is indeed interesting, but storing both the
DN and the GUID solves most of the problems imo.

> Hope this helps. I'm no database guru by any means but I have seen a lot
> of directory servers over the years :-)

Your input is always very welcome, and I am happy you confirm all my
feelings as well :-)

Simo.

-- 
Simo Sorce
Samba Team GPL Compliance Officer
email: idra at samba.org
http://samba.org