[s4] LDB TDB indexes

Tue Nov 17 15:16:19 MST 2009

Hi tridge,

small answers on your questions (will try to discuss this a bit more 
tomorrow on IRC with you).

tridge at samba.org wrote:
> Hi Matthias,
>
>   >  Obviously this doesn't work with my patch. For sure we'll have to be
>   >  careful how to handle existing indexes and maybe convert them. Since
>   >  this problem is important I would appreciate to hear tridge's comment on
>   >  my thoughts.
>
> Do you have an example of a search that gets the wrong result with the
> current index DNs?
>    
Yeah, of course - otherwise I wouldn't claim that we have some sort of 
problem: ldap.py line 1093 (ldb.rename("<SID=" + 
ldb.schema_format_value("objectSID", res_user[0]["objectSID"][0]) + ">" 
, "cn=ldaptestUSER3,cn=users," + self.base_dn)). On the first turn it 
passes - afterwards it doesn't anymore.
If you open sam.ldb with "ldbedit" and try to look for the index record 
for "ldaptestuser3" you find there two instances with each two lines 
("ldaptestuser3" and "ldaptestUSER3"). I think this shouldn't happen. An 
index-entry relation should be unique.
>   >  - DN comparison: The function doesn't seem that efficient. I "upgraded" it a bit
>   >   to be more powerful (added a second length check and do both before the string
>   >   comparison)
>
> I think this change is OK, but do you have any reason to think it
> really is faster? (ie. does it change the overall speed of something
> like a provision?). Remember that strncmp() is a function that returns
> as soon as it finds one character different.
>    
To be honest I had the following problem with the previous version: Does 
"strncmp" work if string 2 is shorter than the third parameter (length 
of string 1)? Does it never segfault (also on different platforms)? With 
my patch the lengths are checked before.
>> - The outside API contains "DN" string arguments: Bad. Since in this way we
>>   fully rely on the outside calls regarding the right DN format. Solution: Use
>>   always a "struct ldb_dn" entry. Since this one is interchangeable and we can
>>   handle it in our preferred way.
>>      
> have you tested what the performance impact of this change is? It
> might be OK, but I'd want some assurance that you had done some real
> performance testing.
>    
Not done but I think the design is clearer and less problematic (wrong 
parameter).
>> - DN normalisation: I think the actual call "ldb_dn_get_linearized"
>>    isn't the right one. Since we e.g. have the DNs "DC=A,DC=org" and
>>    "DC=a,DC=org" (linearised form). As we see they don't seem to be
>>    the same but in fact they should be (DNs are case insensitive when
>>    matching). Therefore I propose to use "ldb_dn_get_casefold" which
>>    normalises them in the right way (upcases them).  And the the two
>>    example DNs become the same as they should.
>>      
> If this is a real bug then please give an example, and we should also
> add a test case to our testsuite that demonstrates it.
>
> I can't actually see how the existing code could produce an incorrect
> answer, as the ltdb_dn_list_remove_duplicates() along with the post
> filtering will ensure we don't end up with duplicates, and the use of
> the correct case in the index keys means we should always find all the
> records. I could be wrong though, so please provide an example.
>    
Well, discussed above.
> I'm wary of using ldb_dn_get_casefold() as it is sometimes much slower
> than ldb_dn_get_linearized(). If it is really needed then we'll find a
> way of making it fast, but first you need to convince me its needed.
>    
Yeah, this is correct. Also abartlet and simo told me this (casefold is 
slow).

Matthias