[LDB] Store index DNs as canonical case

Mon Aug 31 16:37:03 MDT 2009

On Tue, 2009-09-01 at 08:22 +1000, Andrew Bartlett wrote:
> On Mon, 2009-08-31 at 13:10 -0400, simo wrote:
> > On Mon, 2009-08-31 at 23:27 +1000, Andrew Bartlett wrote:
> > > The attached patch reworks our index code to always store the canonical
> > > casefolded form of the DN in an index.  It does not work yet, and needs
> > > to add a 'index version' to the ldb to trigger a reindex.  The
> > > casefolded index entries should be backward compatible, because the
> > > previous code accepted any case variation, so we are simply being more
> > > strict in what we now write.  
> > > 
> > > This was inspired by a bug where we would not delete index entries
> > > because the DN was not in a canonical from, and the existing
> > > strcasecmp() didn't match.  
> > > 
> > > (strcasecmp isn't the right option any more anyway)
> > > 
> > > This stems from the fact that LDB DNs were just case-insensitive strings
> > > originally, but have become far more complex since then. 
> > > 
> > > Any comments would be most welcome while I chase down the remaining
> > > issues. 
> > 
> > Comment:
> > this means that the index string format depends on the case sensitivity
> > of an attribute, this is a change in behavior, although I see you
> > recognize the need of a re-index the db on upgrade.
> 
> Given that the on-disk TDB_KEY DN=<casefold_dn> already varies like
> this, we simply get closer to what I think should have done in the first
> place, and stored the TDB key in the index)!

Yes, that is true and I concur.

> > Question:
> > Have you done any test performance-wise ?
> 
> Not yet.  While I hope to improve performance, this is actually
> initiated from thoughts of correctness (on delete, the old code did a
> strcasecmp() - masked behind ldb_attr_cmp() - on the DN string, and
> could therefore possibly remove the wrong DN). 

Yes correctness is paramount, yet it would be nice to asses also the
performance impact to see if we need more work on the general system to
keep ldb fast going forward.

> > Aside: I have seen some odd behavior with indexes I think we need to be
> > a bit smarter with some search filters and reorder internal searches so
> > that we parse first indexes with the smallest number of entries, esp
> > when you have and 'and' expression of the form (&(foo=x)(bar=y)).
> > Have you looked into any of this by chance ?
> 
> No.  To do that we would have to start storing the number of values in
> the index somehow.  

The size of the TDB_DATA would probably be enough for a rough estimate,
it would be nice to retrieve both indexes but parse them based on the
shortest first, so that if there are no matches we can immediately
discard both without having to process the big one.

Simo.

-- 
Simo Sorce
Samba Team GPL Compliance Officer <simo at samba.org>
Principal Software Engineer at Red Hat, Inc. <simo at redhat.com>