[SCM] Samba Shared Repository - branch master updated
Andrew Bartlett
abartlet at samba.org
Thu May 23 00:20:01 UTC 2024
The branch, master has been updated
via d6581d213d5 ldb: move struct ldb_debug_ops to ldb_private.h
via 6dd68d89786 ldb: move struct ldb_utf8_fns to ldb_private.h
via a00c0ebd090 s4:dsdb:strcasecmp_with_ldb_val() avoids overflow
via b6974030e6a lib/fuzzing: add fuzz_strncasecmp_ldb
via b22e1d3207d ldb: don't cast to unsigned for ldb_ascii_toupper()
via e33a0dd70f0 ldb: ldb_set_utf8_functions follows README.Coding
via 4a6a1d1f0af ldb: deprecate ldb_set_utf8_fns
via 42ae85d70af ldb: remove old ldb_comparison_fold_utf8_broken()
via 960724a06e4 ldb: ldb_comparison_fold always uses the casecmp function
via edabb9f4cb9 ldb-samba: use ldb_comparison_fold_utf8()
via 0becc8a90cb ldb-samba: add ldb_comparison_fold_utf8, wrapping strncasecmp_ldb
via f9797950fd6 util:charset: strncasecmp_ldb avoids iconv for ASCII
via 55397514db5 util:charset: strncasecmp_ldb degrades to ASCII strncasecmp
via eb91e3437b4 util:charset: add strncasecmp_ldb()
via 7cc3c56293d ldb: ldb_set_utf8_default() sets comparison function
via 6c27284f7e9 ldb: ldb_comparison_fold_ascii sorts unsigned
via 92275e27947 ldb: add ldb_comparison_fold_ascii() for default comparisons
via 947f977acb7 ldb: ldb_comparison_fold uses the utf-8 casecmp function
via ae7ca36830b ldb: add ldb_set_utf8_functions() for setting casefold functions
via 1624ac7a987 ldb: move ldb_comparison_fold guts into a separate function
via 278a3c7f7c6 ldb: add a utf-8 comparison fold callback
via f9fbc7a5067 lib/util/charset: be explicit about INVALID_CODEPOINT value
via 023a7ce7d5a ldb: add test_ldb_comparison_fold
from 589a9ea6767 s4:kdc: Add comment about possible interaction between the krbtgt account and Group Managed Service Accounts
https://git.samba.org/?p=samba.git;a=shortlog;h=master
- Log -----------------------------------------------------------------
commit d6581d213d5f625da493f14620e1a12e79a8e195
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Thu May 23 09:40:00 2024 +1200
ldb: move struct ldb_debug_ops to ldb_private.h
Only accessed through struct ldb_context -> debug_ops, which is already private.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
Autobuild-User(master): Andrew Bartlett <abartlet at samba.org>
Autobuild-Date(master): Thu May 23 00:19:30 UTC 2024 on atb-devel-224
commit 6dd68d897865bd2518a6a71753ca0bc76d51b37e
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Thu May 23 09:36:57 2024 +1200
ldb: move struct ldb_utf8_fns to ldb_private.h
It is only accessed via ldb functions that find it on the already-private
struct ldb_context.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit a00c0ebd090f69f94ce6ba7774a9fc126d7de504
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Mon May 13 11:08:35 2024 +1200
s4:dsdb:strcasecmp_with_ldb_val() avoids overflow
In the unlikely event that strlen(str) > INT_MAX, the result could
have overflowed.
This is not a sort transitivity issue, as this is not a symmetric sort
comparison, but it would affect binary search reliability.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit b6974030e6a7ddb330894f46631c8da4359b2d18
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Mon May 13 10:39:44 2024 +1200
lib/fuzzing: add fuzz_strncasecmp_ldb
As well as checking for the usual overflows, this asserts that
strncasecmp_ldb is always transitive, by splitting the input into 3
pieces and comparing all pairs.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit b22e1d3207d90f102247d690bfe31db55d7b681e
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Fri May 17 11:38:10 2024 +1200
ldb: don't cast to unsigned for ldb_ascii_toupper()
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit e33a0dd70f00481d1c3d9e2fdd227e26431402ef
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Tue May 21 10:55:53 2024 +1200
ldb: ldb_set_utf8_functions follows README.Coding
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit 4a6a1d1f0afa830a679781a522d724bd861a3601
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Fri May 17 11:35:01 2024 +1200
ldb: deprecate ldb_set_utf8_fns
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit 42ae85d70af8da1aecbf45f5fb6e7d7ee1c379fb
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Fri May 10 15:43:36 2024 +1200
ldb: remove old ldb_comparison_fold_utf8_broken()
There are no callers.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit 960724a06e4dcb793d606c71d6e79387761b3d42
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Thu May 16 17:01:10 2024 +1200
ldb: ldb_comparison_fold always uses the casecmp function
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit edabb9f4cb9460f382a621a1f494cfdac615232a
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Thu May 16 14:09:46 2024 +1200
ldb-samba: use ldb_comparison_fold_utf8()
This means ldb-samba/dsdb comparisons will be case-insensitive for
non-ASCII UTF-8 characters (within the bounds of the 16-bit casefold
table). And they will remain transitive.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit 0becc8a90cbeac7022a72061debe2edc5b67680a
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Fri May 10 15:42:46 2024 +1200
ldb-samba: add ldb_comparison_fold_utf8, wrapping strncasecmp_ldb
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit f9797950fd69c16dfab39804dc53172977a345ee
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Tue May 14 21:33:16 2024 +1200
util:charset: strncasecmp_ldb avoids iconv for ASCII
This is a common case, and we can save a bit of work.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit 55397514db568ca7b75acf139afd527ece137bc1
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Mon May 13 11:32:26 2024 +1200
util:charset: strncasecmp_ldb degrades to ASCII strncasecmp
If strncasecmp_ldb() encounters invalid utf-8 bytes, it compares those
as greater than any valid bytes (that is, it sorts them to the end of
the list).
If an invalid sequence is encountered in both strings at once, the
rest of the strings are now compared using the default ldb_comparison_fold
rules, as implemented in ldb_comparison_fold_ascii(). That is, each
byte is compared individually, [a-z] are translated to [A-Z], and runs of
spaces are collapsed into single spaces.
There is no perfect answer in this case, but this solution is stable,
fine-grained, and probably close to what is expected. This
byte-by-byte comparison is equivalent to a utf-8 comparison without
case-folding of multibyte codes.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit eb91e3437b44c7ad653aac86d481ceaaddb06b01
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Tue Apr 30 12:41:43 2024 +1200
util:charset: add strncasecmp_ldb()
This is a function for comparing strings in a way that suits a
case-insenstive syntaxes in LDB.
We have it here, rahter than in LDB itself, because it needs the
upcase table. By default uses ASCII-only comparisons. SSSD and
OpenChange use it in that configuration, but Samba replaces the
comparison and casefold functions with Unicode aware versions.
Until now Samba has done that in a bad way; this will allow it to do
better.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit 7cc3c56293d9c93d9c88fba8df0e998db3f7eaf7
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Fri May 17 11:37:18 2024 +1200
ldb: ldb_set_utf8_default() sets comparison function
The default is ASCII only, which is used by SSSD and OpenChange.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit 6c27284f7e9feae7e37072449e0c752034f6b672
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Thu May 9 17:21:29 2024 +1200
ldb: ldb_comparison_fold_ascii sorts unsigned
Typically in 8-bit character sets, those with the 0x80 bit set are
seen as 288-255, not negative numbers. This will sort them after 'Z',
not before 'A'.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit 92275e27947989706561292f47789a8d715a11d1
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Wed May 15 20:51:08 2024 +1200
ldb: add ldb_comparison_fold_ascii() for default comparisons
This function is made from the ASCII-only bits of the old
ldb_comparison_fold() -- that is, what you get if you never follow a
`goto utf8str` jump. It comparse the bytes, but collapses spaces and
maps [a-z] to [A-Z].
This does exactly what ldb_comparison_fold_utf8_broken() would do in
situations where ldb_casfold() calls ldb_casefold_default(). That
means SSSD.
The comparison is probably using signed char, so high bytes are
actually low bytes.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit 947f977acb7946a4521cc8be2e7c0a61bd0e3f1e
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Sun May 19 15:09:26 2024 +1200
ldb: ldb_comparison_fold uses the utf-8 casecmp function
But only if it is set, which it never is (so far).
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit ae7ca36830be7823dde17bcaeae74b5f46b1aa3d
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Fri May 17 11:34:35 2024 +1200
ldb: add ldb_set_utf8_functions() for setting casefold functions
This replaces ldb_set_utf8_fns(), which will be deprecated really soon.
The reason for this, as shown in surrounding commits, is that without
an explicit case-insensitive comparison we need to rely on the casefold,
and if the casefold can fail (because, e.g. bad utf-8) the comparison
ends up being a bit chaotic. The strings being compared are generally
user controlled, and a malicious user might find ways of hiding values
or perhaps fooling a binary search.
A case-insensitive comparisons that works gradually through the string
without an all-at-once casefold is better placed to deal with problems
where they happen, and we are able to separately specialise for the
ASCII case (used by SSSD) and the UTF-8 case (Samba).
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit 1624ac7a9876b4b8779364542747f66f5832a709
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Thu May 16 14:10:06 2024 +1200
ldb: move ldb_comparison_fold guts into a separate function
We're going to make this use a configurable pointer.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit 278a3c7f7c6506134e0e1d15126f55b444f37fbc
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Thu May 9 16:52:53 2024 +1200
ldb: add a utf-8 comparison fold callback
This isn't used yet, but it will allow library users to select a
case-insensitive comparison function that matches their chosen casefold.
This will allow the comparisons to be consistent when the strings are bad,
whereas currently we kind of guess.
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit f9fbc7a5067b78b9fe03e3bcde5e46f82a5704ba
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Wed May 1 15:32:03 2024 +1200
lib/util/charset: be explicit about INVALID_CODEPOINT value
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
commit 023a7ce7d5ae50ff4f0563c68cb84f9f4ad235f2
Author: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Date: Mon May 20 11:15:47 2024 +1200
ldb: add test_ldb_comparison_fold
Currently this fails like this:
test_ldb_comparison_fold_default_common: 118 errors out of 256
test_ldb_comparison_fold_default_ascii: 32 errors out of 100
test_ldb_comparison_fold_utf8_common: 40 errors out of 256
test_ldb_comparison_fold_utf8: 28 errors out of 100
Signed-off-by: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet at samba.org>
-----------------------------------------------------------------------
Summary of changes:
lib/fuzzing/fuzz_strncasecmp_ldb.c | 161 +++++++++++++++++++++++
lib/fuzzing/wscript_build | 5 +
lib/ldb-samba/ldb_wrap.c | 10 +-
lib/ldb-samba/ldb_wrap.h | 5 +
lib/ldb-samba/pyldb.c | 2 +-
lib/ldb-samba/samba_extensions.c | 2 +-
lib/ldb/ABI/ldb-2.10.0.sigs | 2 +
lib/ldb/common/attrib_handlers.c | 148 ++-------------------
lib/ldb/common/ldb_utf8.c | 91 ++++++++++++-
lib/ldb/include/ldb.h | 54 ++++----
lib/ldb/include/ldb_private.h | 24 ++++
lib/ldb/tests/test_ldb_comparison_fold.c | 213 +++++++++++++++++++++++++++++++
lib/ldb/wscript | 5 +
lib/util/charset/charset.h | 7 +-
lib/util/charset/util_unistr.c | 199 +++++++++++++++++++++++++++++
selftest/tests.py | 1 +
source4/dsdb/common/tests/dsdb_dn.c | 6 +-
source4/dsdb/schema/schema_query.c | 4 +-
source4/torture/ldb/ldb.c | 10 +-
19 files changed, 766 insertions(+), 183 deletions(-)
create mode 100644 lib/fuzzing/fuzz_strncasecmp_ldb.c
create mode 100644 lib/ldb/tests/test_ldb_comparison_fold.c
Changeset truncated at 500 lines:
diff --git a/lib/fuzzing/fuzz_strncasecmp_ldb.c b/lib/fuzzing/fuzz_strncasecmp_ldb.c
new file mode 100644
index 00000000000..0f785b5bee7
--- /dev/null
+++ b/lib/fuzzing/fuzz_strncasecmp_ldb.c
@@ -0,0 +1,161 @@
+/*
+ Fuzzing ldb_comparison_fold()
+ Copyright (C) Catalyst IT 2020
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+#include "includes.h"
+#include "fuzzing/fuzzing.h"
+#include "charset.h"
+
+
+int LLVMFuzzerInitialize(int *argc, char ***argv)
+{
+ return 0;
+}
+
+
+int LLVMFuzzerTestOneInput(const uint8_t *input, size_t len)
+{
+ struct ldb_val v[3] = {{},{},{}};
+ size_t i, j, k;
+ int results[9], ab, ac, bc;
+
+ if (len < 3) {
+ return 0;
+ }
+
+ j = 0;
+ k = 0;
+ v[j].data = discard_const(input);
+
+ /*
+ * We split the input into 3 ldb_vals, on the byte '*' (42), chosen
+ * because it is *not* special with regard to termination, utf-8, or
+ * casefolding.
+ *
+ * if there are not 2 '*' bytes, the last value[s] will be empty, with
+ * a NULL pointer and zero length.
+ */
+
+ for (i = 0; i < len; i++) {
+ if (input[i] != '*') {
+ continue;
+ }
+ v[j].length = i - k;
+ i++;
+ j++;
+ if (j > 2 || i == len) {
+ break;
+ }
+ k = i;
+ v[j].data = discard_const(input + k);
+ }
+
+ for (i = 0; i < 3; i++) {
+ char *s1 = (char*)v[i].data;
+ size_t len1 = v[i].length;
+ for (j = 0; j < 3; j++) {
+ char *s2 = (char*)v[j].data;
+ size_t len2 = v[j].length;
+ int r = strncasecmp_ldb(s1, len1, s2, len2);
+ if (abs(r) > 1) {
+ abort();
+ }
+ results[i * 3 + j] = r;
+ }
+ }
+
+ /*
+ * There are nine comparisons we make.
+ *
+ * A B C
+ * A = x x
+ * B - = x
+ * C - - =
+ *
+ * The diagonal should be all zeros (A == A, etc)
+ * The upper and lower triangles should complement each other
+ * (A > B implies B < A; A == B implies B == A).
+ *
+ * So we check for those identities first.
+ */
+
+ if ((results[0] != 0) ||
+ (results[4] != 0) ||
+ (results[8] != 0)) {
+ abort();
+ }
+
+ ab = results[3];
+ ac = results[6];
+ bc = results[7];
+
+ if (ab != -results[1] ||
+ ac != -results[2] ||
+ bc != -results[5]) {
+ abort();
+ }
+
+ /*
+ * Then there are 27 states within the three comparisons of one
+ * triangle, because each of AB, AC, and BC can be in 3 states.
+ *
+ * 0 (A < B) (A < C) (B < C) A < B < C
+ * 1 (A < B) (A < C) (B = C) A < (B|C)
+ * 2 (A < B) (A < C) (B > C) A < C < B
+ * 3 (A < B) (A = C) (B < C) invalid
+ * 4 (A < B) (A = C) (B = C) invalid
+ * 5 (A < B) (A = C) (B > C) (A|C) < B
+ * 6 (A < B) (A > C) (B < C) invalid
+ * 7 (A < B) (A > C) (B = C) invalid
+ * 8 (A < B) (A > C) (B > C) C < A < B
+ * 9 (A = B) (A < C) (B < C) (A|B) < C
+ * 10 (A = B) (A < C) (B = C) invalid
+ * 11 (A = B) (A < C) (B > C) invalid
+ * 12 (A = B) (A = C) (B < C) invalid
+ * 13 (A = B) (A = C) (B = C) A = B = C
+ * 14 (A = B) (A = C) (B > C) invalid
+ * 15 (A = B) (A > C) (B < C) invalid
+ * 16 (A = B) (A > C) (B = C) invalid
+ * 17 (A = B) (A > C) (B > C) C < (A|B)
+ * 18 (A > B) (A < C) (B < C) B < C < A
+ * 19 (A > B) (A < C) (B = C) invalid
+ * 20 (A > B) (A < C) (B > C) invalid
+ * 21 (A > B) (A = C) (B < C) B < (A|C)
+ * 22 (A > B) (A = C) (B = C) invalid
+ * 23 (A > B) (A = C) (B > C) invalid
+ * 24 (A > B) (A > C) (B < C) B < C < A
+ * 25 (A > B) (A > C) (B = C) (B|C) < A
+ * 26 (A > B) (A > C) (B > C) C < B < A
+ *
+ * It actually turns out to be quite simple:
+ */
+
+ if (ab == 0) {
+ if (ac != bc) {
+ abort();
+ }
+ } else if (ab < 0) {
+ if (ac >= 0 && bc <= 0) {
+ abort();
+ }
+ } else {
+ if (ac <= 0 && bc >= 0) {
+ abort();
+ }
+ }
+
+ return 0;
+}
diff --git a/lib/fuzzing/wscript_build b/lib/fuzzing/wscript_build
index 897a114ca7e..ce2684580ce 100644
--- a/lib/fuzzing/wscript_build
+++ b/lib/fuzzing/wscript_build
@@ -169,6 +169,11 @@ bld.SAMBA_BINARY('fuzz_security_token_vs_descriptor_ds',
deps='fuzzing samba-security afl-fuzz-main',
fuzzer=True)
+bld.SAMBA_BINARY('fuzz_strncasecmp_ldb',
+ source='fuzz_strncasecmp_ldb.c',
+ deps='fuzzing samba-util afl-fuzz-main',
+ fuzzer=True)
+
# The fuzz_type and fuzz_function parameters make the built
# fuzzer take the same input as ndrdump and so the same that
diff --git a/lib/ldb-samba/ldb_wrap.c b/lib/ldb-samba/ldb_wrap.c
index 437aaee101a..e5876c80a9c 100644
--- a/lib/ldb-samba/ldb_wrap.c
+++ b/lib/ldb-samba/ldb_wrap.c
@@ -125,6 +125,14 @@ char *wrap_casefold(void *context, void *mem_ctx, const char *s, size_t n)
return strupper_talloc_n(mem_ctx, s, n);
}
+int ldb_comparison_fold_utf8(void *ignored,
+ const struct ldb_val *v1,
+ const struct ldb_val *v2)
+{
+ return strncasecmp_ldb((const char *)v1->data, v1->length,
+ (const char *)v2->data, v2->length);
+}
+
struct ldb_context *samba_ldb_init(TALLOC_CTX *mem_ctx,
struct tevent_context *ev,
@@ -144,7 +152,7 @@ char *wrap_casefold(void *context, void *mem_ctx, const char *s, size_t n)
ldb_set_debug(ldb, ldb_wrap_debug, NULL);
- ldb_set_utf8_fns(ldb, NULL, wrap_casefold);
+ ldb_set_utf8_functions(ldb, NULL, wrap_casefold, ldb_comparison_fold_utf8);
if (session_info) {
if (ldb_set_opaque(ldb, DSDB_SESSION_INFO, session_info)) {
diff --git a/lib/ldb-samba/ldb_wrap.h b/lib/ldb-samba/ldb_wrap.h
index aa7ccb3a234..274d1e6fddf 100644
--- a/lib/ldb-samba/ldb_wrap.h
+++ b/lib/ldb-samba/ldb_wrap.h
@@ -30,9 +30,14 @@ struct ldb_dn;
struct cli_credentials;
struct loadparm_context;
struct tevent_context;
+struct ldb_val;
char *wrap_casefold(void *context, void *mem_ctx, const char *s, size_t n);
+int ldb_comparison_fold_utf8(void *ignored,
+ const struct ldb_val *v1,
+ const struct ldb_val *v2);
+
struct ldb_context *ldb_wrap_connect(TALLOC_CTX *mem_ctx,
struct tevent_context *ev,
struct loadparm_context *lp_ctx,
diff --git a/lib/ldb-samba/pyldb.c b/lib/ldb-samba/pyldb.c
index 958b3ad4b16..b2a485aaefa 100644
--- a/lib/ldb-samba/pyldb.c
+++ b/lib/ldb-samba/pyldb.c
@@ -88,7 +88,7 @@ static PyObject *py_ldb_set_utf8_casefold(PyObject *self,
ldb = pyldb_Ldb_AS_LDBCONTEXT(self);
- ldb_set_utf8_fns(ldb, NULL, wrap_casefold);
+ ldb_set_utf8_functions(ldb, NULL, wrap_casefold, ldb_comparison_fold_utf8);
Py_RETURN_NONE;
}
diff --git a/lib/ldb-samba/samba_extensions.c b/lib/ldb-samba/samba_extensions.c
index be92d982dde..aecc2d70dea 100644
--- a/lib/ldb-samba/samba_extensions.c
+++ b/lib/ldb-samba/samba_extensions.c
@@ -144,7 +144,7 @@ static int extensions_hook(struct ldb_context *ldb, enum ldb_module_hook_type t)
return ldb_operr(ldb);
}
- ldb_set_utf8_fns(ldb, NULL, wrap_casefold);
+ ldb_set_utf8_functions(ldb, NULL, wrap_casefold, ldb_comparison_fold_utf8);
break;
}
diff --git a/lib/ldb/ABI/ldb-2.10.0.sigs b/lib/ldb/ABI/ldb-2.10.0.sigs
index 2266387cd60..f23014ffaaa 100644
--- a/lib/ldb/ABI/ldb-2.10.0.sigs
+++ b/lib/ldb/ABI/ldb-2.10.0.sigs
@@ -23,6 +23,7 @@ ldb_casefold_default: char *(void *, TALLOC_CTX *, const char *, size_t)
ldb_check_critical_controls: int (struct ldb_control **)
ldb_comparison_binary: int (struct ldb_context *, void *, const struct ldb_val *, const struct ldb_val *)
ldb_comparison_fold: int (struct ldb_context *, void *, const struct ldb_val *, const struct ldb_val *)
+ldb_comparison_fold_ascii: int (void *, const struct ldb_val *, const struct ldb_val *)
ldb_connect: int (struct ldb_context *, const char *, unsigned int, const char **)
ldb_control_to_string: char *(TALLOC_CTX *, const struct ldb_control *)
ldb_controls_except_specified: struct ldb_control **(struct ldb_control **, TALLOC_CTX *, struct ldb_control *)
@@ -275,6 +276,7 @@ ldb_set_timeout: int (struct ldb_context *, struct ldb_request *, int)
ldb_set_timeout_from_prev_req: int (struct ldb_context *, struct ldb_request *, struct ldb_request *)
ldb_set_utf8_default: void (struct ldb_context *)
ldb_set_utf8_fns: void (struct ldb_context *, void *, char *(*)(void *, void *, const char *, size_t))
+ldb_set_utf8_functions: void (struct ldb_context *, void *, char *(*)(void *, void *, const char *, size_t), int (*)(void *, const struct ldb_val *, const struct ldb_val *))
ldb_setup_wellknown_attributes: int (struct ldb_context *)
ldb_should_b64_encode: int (struct ldb_context *, const struct ldb_val *)
ldb_standard_syntax_by_name: const struct ldb_schema_syntax *(struct ldb_context *, const char *)
diff --git a/lib/ldb/common/attrib_handlers.c b/lib/ldb/common/attrib_handlers.c
index e6d412bd3cf..145ff487310 100644
--- a/lib/ldb/common/attrib_handlers.c
+++ b/lib/ldb/common/attrib_handlers.c
@@ -327,146 +327,18 @@ int ldb_comparison_binary(struct ldb_context *ldb, void *mem_ctx,
}
/*
- compare two case insensitive strings, ignoring multiple whitespaces
- and leading and trailing whitespaces
- see rfc2252 section 8.1
-
- try to optimize for the ascii case,
- but if we find out an utf8 codepoint revert to slower but correct function
-*/
+ * ldb_comparison_fold is a schema syntax comparison_fn for utf-8 strings that
+ * collapse multiple spaces into one (e.g. "Directory String" syntax).
+ *
+ * The default comparison function only performs ASCII case-folding, and only
+ * collapses multiple spaces, not tabs and other whitespace (contrary to
+ * RFC4518). To change the comparison function (as Samba does), use
+ * ldb_set_utf8_functions().
+ */
int ldb_comparison_fold(struct ldb_context *ldb, void *mem_ctx,
- const struct ldb_val *v1, const struct ldb_val *v2)
+ const struct ldb_val *v1, const struct ldb_val *v2)
{
- const char *s1=(const char *)v1->data, *s2=(const char *)v2->data;
- size_t n1 = v1->length, n2 = v2->length;
- char *b1, *b2;
- const char *u1, *u2;
- int ret;
-
- while (n1 && *s1 == ' ') { s1++; n1--; };
- while (n2 && *s2 == ' ') { s2++; n2--; };
-
- while (n1 && n2 && *s1 && *s2) {
- /* the first 127 (0x7F) chars are ascii and utf8 guarantees they
- * never appear in multibyte sequences */
- if (((unsigned char)s1[0]) & 0x80) goto utf8str;
- if (((unsigned char)s2[0]) & 0x80) goto utf8str;
- if (ldb_ascii_toupper(*s1) != ldb_ascii_toupper(*s2)) {
- break;
- }
- if (*s1 == ' ') {
- while (n1 > 1 && s1[0] == s1[1]) { s1++; n1--; }
- while (n2 > 1 && s2[0] == s2[1]) { s2++; n2--; }
- }
- s1++; s2++;
- n1--; n2--;
- }
-
- /* check for trailing spaces only if the other pointers has
- * reached the end of the strings otherwise we can
- * mistakenly match. ex. "domain users" <->
- * "domainUpdates"
- */
- if (n1 && *s1 == ' ' && (!n2 || !*s2)) {
- while (n1 && *s1 == ' ') { s1++; n1--; }
- }
- if (n2 && *s2 == ' ' && (!n1 || !*s1)) {
- while (n2 && *s2 == ' ') { s2++; n2--; }
- }
- if (n1 == 0 && n2 != 0) {
- return -(int)ldb_ascii_toupper(*s2);
- }
- if (n2 == 0 && n1 != 0) {
- return (int)ldb_ascii_toupper(*s1);
- }
- if (n1 == 0 && n2 == 0) {
- return 0;
- }
- return (int)ldb_ascii_toupper(*s1) - (int)ldb_ascii_toupper(*s2);
-
-utf8str:
- /*
- * No need to recheck from the start, just from the first utf8 charu
- * found. Note that the callback of ldb_casefold() needs to be ascii
- * compatible.
- *
- * Probably ldb_casefold() is wrap_casefold() which wraps
- * strupper_talloc_n().
- */
- b1 = ldb_casefold(ldb, mem_ctx, s1, n1);
- b2 = ldb_casefold(ldb, mem_ctx, s2, n2);
-
- if (!b1 || !b2) {
- /*
- * One of the strings was not UTF8, so we have no
- * options but to do a binary compare.
- *
- * FIXME: this can be non-transitive.
- *
- * consider {
- * CA 8A "ʊ"
- * C6 B1 "Ʊ"
- * C8 FE invalid utf-8
- * }
- *
- * The byte "0xfe" is always invalid in utf-8, so the
- * comparisons against that string end up coming this way,
- * while the "Ʊ" vs "ʊ" comparison goes via the ldb_casefold
- * branch. Then:
- *
- * "ʊ" == "Ʊ" by casefold.
- * "ʊ" > {c8 fe} by byte comparison.
- * "Ʊ" < {c8 fe} by byte comparison.
- *
- * In many cases there are no invalid encodings between the
- * upper and lower case letters, but the string as a whole
- * might also compare differently due to the space-eating in
- * the other branch.
- */
- talloc_free(b1);
- talloc_free(b2);
- ret = memcmp(s1, s2, MIN(n1, n2));
- if (ret == 0) {
- if (n1 == n2) {
- return 0;
- }
- if (n1 > n2) {
- if (s1[n2] == '\0') {
- return 0;
- }
- return 1;
- } else {
- if (s2[n1] == '\0') {
- return 0;
- }
- return -1;
- }
- }
- return ret;
- }
-
- u1 = b1;
- u2 = b2;
-
- while (*u1 & *u2) {
- if (*u1 != *u2)
- break;
- if (*u1 == ' ') {
- while (u1[0] == u1[1]) u1++;
- while (u2[0] == u2[1]) u2++;
- }
- u1++; u2++;
- }
- if (! (*u1 && *u2)) {
- while (*u1 == ' ') u1++;
- while (*u2 == ' ') u2++;
- }
- ret = NUMERIC_CMP(*u1, *u2);
-
- talloc_free(b1);
- talloc_free(b2);
-
- return ret;
+ return ldb->utf8_fns.casecmp(ldb->utf8_fns.context, v1, v2);
}
diff --git a/lib/ldb/common/ldb_utf8.c b/lib/ldb/common/ldb_utf8.c
index 178bdd86de1..6891de84101 100644
--- a/lib/ldb/common/ldb_utf8.c
+++ b/lib/ldb/common/ldb_utf8.c
@@ -34,6 +34,27 @@
#include "ldb_private.h"
#include "system/locale.h"
+/*
+ * Set functions for comparing and case-folding case-insensitive ldb val
+ * strings.
+ */
+void ldb_set_utf8_functions(struct ldb_context *ldb,
+ void *context,
+ char *(*casefold)(void *, void *, const char *, size_t),
+ int (*casecmp)(void *ctx,
+ const struct ldb_val *v1,
+ const struct ldb_val *v2))
+{
+ if (context) {
+ ldb->utf8_fns.context = context;
+ }
+ if (casefold) {
+ ldb->utf8_fns.casefold = casefold;
+ }
+ if (casecmp) {
+ ldb->utf8_fns.casecmp = casecmp;
+ }
+}
/*
this allow the user to pass in a caseless comparison
@@ -43,12 +64,10 @@ void ldb_set_utf8_fns(struct ldb_context *ldb,
void *context,
char *(*casefold)(void *, void *, const char *, size_t))
{
- if (context)
- ldb->utf8_fns.context = context;
- if (casefold)
- ldb->utf8_fns.casefold = casefold;
+ ldb_set_utf8_functions(ldb, context, casefold, NULL);
}
+
/*
a simple case folding function
NOTE: does not handle UTF8
@@ -62,14 +81,72 @@ char *ldb_casefold_default(void *context, TALLOC_CTX *mem_ctx, const char *s, si
return NULL;
}
for (i=0;ret[i];i++) {
- ret[i] = ldb_ascii_toupper((unsigned char)ret[i]);
+ ret[i] = ldb_ascii_toupper(ret[i]);
}
return ret;
}
+
+/*
+ * The default comparison fold function only knows ASCII. Multiple
+ * spaces (0x20) are collapsed into one, and [a-z] map to [A-Z]. All
+ * other bytes are compared without casefolding.
+ *
--
Samba Shared Repository
More information about the samba-cvs
mailing list