[PATCH] Improvements to reindex and join performance.

Gary Lockyer gary at catalyst.net.nz
Wed Apr 3 02:56:31 UTC 2019


The ldb key value layer maintains an in memory tdb to cache indexing
operations.  The size used for this (1000) is insufficient for large
databases and performance is affected.

These patches modify the key value back end stores to return estimates
of the number of records in the data base, and use this value to size
the cache accordingly.

For other large batch operations an optional parameter
"transaction_index_cache_size" has been added to ldb.  Setting this when
opening an ldb connection, should improve performance of large transactions.


Merge request: https://gitlab.com/samba-team/samba/merge_requests/356

CI: https://gitlab.com/samba-team/devel/samba/pipelines/54903144

Review and comments appreciated.

Ngā mihi

Gary




-------------- next part --------------
From 894e5d75f0859b676530297702af732e04b01528 Mon Sep 17 00:00:00 2001
From: Gary Lockyer <gary at catalyst.net.nz>
Date: Mon, 1 Apr 2019 13:12:20 +1300
Subject: [PATCH 1/6] lib ldb key_value: Remove index cache lazy initialisation

Remove the lazy initialisation of the index cache. This make setting
the size of the cache for re-indexing easier, which will be done in
later commits.

Performance testing shows that the removal of lazy initialisation makes
no appreciable difference to performance.

Signed-off-by: Gary Lockyer <gary at catalyst.net.nz>
---
 lib/ldb/ldb_key_value/ldb_kv_index.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/lib/ldb/ldb_key_value/ldb_kv_index.c b/lib/ldb/ldb_key_value/ldb_kv_index.c
index 4dfcc2cb0ba..e8b2c824b6c 100644
--- a/lib/ldb/ldb_key_value/ldb_kv_index.c
+++ b/lib/ldb/ldb_key_value/ldb_kv_index.c
@@ -208,6 +208,11 @@ int ldb_kv_index_transaction_start(struct ldb_module *module)
 		return ldb_oom(ldb_module_get_ctx(module));
 	}
 
+	ldb_kv->idxptr->itdb = tdb_open(NULL, 1000, TDB_INTERNAL, O_RDWR, 0);
+	if (ldb_kv->idxptr->itdb == NULL) {
+		return LDB_ERR_OPERATIONS_ERROR;
+	}
+
 	return LDB_SUCCESS;
 }
 
@@ -717,14 +722,6 @@ static int ldb_kv_dn_list_store(struct ldb_module *module,
 		return ldb_kv_dn_list_store_full(module, ldb_kv, dn, list);
 	}
 
-	if (ldb_kv->idxptr->itdb == NULL) {
-		ldb_kv->idxptr->itdb =
-		    tdb_open(NULL, 1000, TDB_INTERNAL, O_RDWR, 0);
-		if (ldb_kv->idxptr->itdb == NULL) {
-			return LDB_ERR_OPERATIONS_ERROR;
-		}
-	}
-
 	key.dptr = discard_const_p(unsigned char, ldb_dn_get_linearized(dn));
 	if (key.dptr == NULL) {
 		return LDB_ERR_OPERATIONS_ERROR;
-- 
2.18.1


From 44c22ca01ef1ba84eb4e6a2f0379db9e1dbb5df5 Mon Sep 17 00:00:00 2001
From: Gary Lockyer <gary at catalyst.net.nz>
Date: Mon, 1 Apr 2019 14:10:10 +1300
Subject: [PATCH 2/6] lib ldb key_value: Pass index cache size

Pass the index cache size to ldb_kv_index_transaction_start.  This will
allow it to be set for reindex and join operations, where the current
defaults result in a significant performance penalty on large databases.

Signed-off-by: Gary Lockyer <gary at catalyst.net.nz>
---
 lib/ldb/ldb_key_value/ldb_kv.c       |   2 +-
 lib/ldb/ldb_key_value/ldb_kv.h       |  14 ++-
 lib/ldb/ldb_key_value/ldb_kv_index.c |  13 ++-
 lib/ldb/tests/ldb_key_value_test.c   | 162 +++++++++++++++++++++++++++
 lib/ldb/wscript                      |   8 +-
 5 files changed, 193 insertions(+), 6 deletions(-)
 create mode 100644 lib/ldb/tests/ldb_key_value_test.c

diff --git a/lib/ldb/ldb_key_value/ldb_kv.c b/lib/ldb/ldb_key_value/ldb_kv.c
index d4f896736a2..8247ada4256 100644
--- a/lib/ldb/ldb_key_value/ldb_kv.c
+++ b/lib/ldb/ldb_key_value/ldb_kv.c
@@ -1382,7 +1382,7 @@ static int ldb_kv_start_trans(struct ldb_module *module)
 		return ldb_kv->kv_ops->error(ldb_kv);
 	}
 
-	ldb_kv_index_transaction_start(module);
+	ldb_kv_index_transaction_start(module, DEFAULT_INDEX_CACHE_SIZE);
 
 	ldb_kv->reindex_failed = false;
 
diff --git a/lib/ldb/ldb_key_value/ldb_kv.h b/lib/ldb/ldb_key_value/ldb_kv.h
index 5070a588c00..2c08b1db480 100644
--- a/lib/ldb/ldb_key_value/ldb_kv.h
+++ b/lib/ldb/ldb_key_value/ldb_kv.h
@@ -4,6 +4,8 @@
 #include "tdb.h"
 #include "ldb_module.h"
 
+#ifndef __LDB_KV_H__
+#define __LDB_KV_H__
 struct ldb_kv_private;
 typedef int (*ldb_kv_traverse_fn)(struct ldb_kv_private *ldb_kv,
 				  struct ldb_val key,
@@ -175,6 +177,13 @@ int ldb_kv_check_at_attributes_values(const struct ldb_val *value);
  * The following definitions come from lib/ldb/ldb_key_value/ldb_kv_index.c
  */
 
+/*
+ * The default size of the in memory TDB used to cache index records
+ * The value chosen gives a prime modulo for the hash table and keeps the
+ * tdb memory overhead under 4 kB
+ */
+#define DEFAULT_INDEX_CACHE_SIZE 491
+
 struct ldb_parse_tree;
 
 int ldb_kv_search_indexed(struct ldb_kv_context *ctx, uint32_t *);
@@ -197,7 +206,9 @@ int ldb_kv_index_del_value(struct ldb_module *module,
 			   struct ldb_message_element *el,
 			   unsigned int v_idx);
 int ldb_kv_reindex(struct ldb_module *module);
-int ldb_kv_index_transaction_start(struct ldb_module *module);
+int ldb_kv_index_transaction_start(
+	struct ldb_module *module,
+	size_t cache_size);
 int ldb_kv_index_transaction_commit(struct ldb_module *module);
 int ldb_kv_index_transaction_cancel(struct ldb_module *module);
 int ldb_kv_key_dn_from_idx(struct ldb_module *module,
@@ -263,3 +274,4 @@ int ldb_kv_init_store(struct ldb_kv_private *ldb_kv,
 		      struct ldb_context *ldb,
 		      const char *options[],
 		      struct ldb_module **_module);
+#endif /* __LDB_KV_H__ */
diff --git a/lib/ldb/ldb_key_value/ldb_kv_index.c b/lib/ldb/ldb_key_value/ldb_kv_index.c
index e8b2c824b6c..003ed714df9 100644
--- a/lib/ldb/ldb_key_value/ldb_kv_index.c
+++ b/lib/ldb/ldb_key_value/ldb_kv_index.c
@@ -199,7 +199,9 @@ static unsigned ldb_kv_max_key_length(struct ldb_kv_private *ldb_kv)
 }
 
 /* enable the idxptr mode when transactions start */
-int ldb_kv_index_transaction_start(struct ldb_module *module)
+int ldb_kv_index_transaction_start(
+	struct ldb_module *module,
+	size_t cache_size)
 {
 	struct ldb_kv_private *ldb_kv = talloc_get_type(
 	    ldb_module_get_private(module), struct ldb_kv_private);
@@ -208,7 +210,12 @@ int ldb_kv_index_transaction_start(struct ldb_module *module)
 		return ldb_oom(ldb_module_get_ctx(module));
 	}
 
-	ldb_kv->idxptr->itdb = tdb_open(NULL, 1000, TDB_INTERNAL, O_RDWR, 0);
+	ldb_kv->idxptr->itdb = tdb_open(
+		NULL,
+		cache_size,
+		TDB_INTERNAL,
+		O_RDWR,
+		0);
 	if (ldb_kv->idxptr->itdb == NULL) {
 		return LDB_ERR_OPERATIONS_ERROR;
 	}
@@ -3105,7 +3112,7 @@ int ldb_kv_reindex(struct ldb_module *module)
 	 */
 	ldb_kv_index_transaction_cancel(module);
 
-	ret = ldb_kv_index_transaction_start(module);
+	ret = ldb_kv_index_transaction_start(module, DEFAULT_INDEX_CACHE_SIZE);
 	if (ret != LDB_SUCCESS) {
 		return ret;
 	}
diff --git a/lib/ldb/tests/ldb_key_value_test.c b/lib/ldb/tests/ldb_key_value_test.c
new file mode 100644
index 00000000000..ba632961768
--- /dev/null
+++ b/lib/ldb/tests/ldb_key_value_test.c
@@ -0,0 +1,162 @@
+/*
+ * Tests exercising the ldb key value operations.
+ *
+ *  Copyright (C) Andrew Bartlett <abartlet at samba.org> 2019
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/*
+ * from cmocka.c:
+ * These headers or their equivalents should be included prior to
+ * including
+ * this header file.
+ *
+ * #include <stdarg.h>
+ * #include <stddef.h>
+ * #include <setjmp.h>
+ *
+ * This allows test applications to use custom definitions of C standard
+ * library functions and types.
+ *
+ */
+
+/*
+ *
+ * Tests for the ldb key value layer
+ */
+#include <stdarg.h>
+#include <stddef.h>
+#include <setjmp.h>
+#include <cmocka.h>
+
+#include <errno.h>
+#include <unistd.h>
+#include <talloc.h>
+#include <tevent.h>
+#include <string.h>
+#include <ctype.h>
+
+#include <sys/wait.h>
+
+#include "ldb_key_value/ldb_kv.c"
+#include "ldb_key_value/ldb_kv_cache.c"
+#include "ldb_key_value/ldb_kv_index.c"
+#include "ldb_key_value/ldb_kv_search.c"
+
+#define DEFAULT_BE  "tdb"
+
+#ifndef TEST_BE
+#define TEST_BE DEFAULT_BE
+#endif /* TEST_BE */
+
+#define NUM_RECS 1024
+
+
+struct test_ctx {
+	const char *dbfile;
+	const char *lockfile;   /* lockfile is separate */
+	const char *dbpath;
+};
+
+static void unlink_old_db(struct test_ctx *test_ctx)
+{
+	int ret;
+
+	errno = 0;
+	ret = unlink(test_ctx->lockfile);
+	if (ret == -1 && errno != ENOENT) {
+		fail();
+	}
+
+	errno = 0;
+	ret = unlink(test_ctx->dbfile);
+	if (ret == -1 && errno != ENOENT) {
+		fail();
+	}
+}
+
+static int setup(void **state)
+{
+	struct test_ctx *test_ctx;
+
+	test_ctx = talloc_zero(NULL, struct test_ctx);
+	assert_non_null(test_ctx);
+
+	test_ctx->dbfile = talloc_strdup(test_ctx, "kvopstest.ldb");
+	assert_non_null(test_ctx->dbfile);
+
+	test_ctx->lockfile = talloc_asprintf(test_ctx, "%s-lock",
+					     test_ctx->dbfile);
+	assert_non_null(test_ctx->lockfile);
+
+	test_ctx->dbpath = talloc_asprintf(test_ctx,
+			TEST_BE"://%s", test_ctx->dbfile);
+	assert_non_null(test_ctx->dbpath);
+
+	unlink_old_db(test_ctx);
+	*state = test_ctx;
+	return 0;
+}
+
+static int teardown(void **state)
+{
+	struct test_ctx *test_ctx = talloc_get_type_abort(*state,
+							  struct test_ctx);
+
+	unlink_old_db(test_ctx);
+	talloc_free(test_ctx);
+	return 0;
+}
+
+/*
+ * Test that the index cache is opened by ldb_kv_index_transaction_start
+ * and correctly initialised.
+ */
+static void test_index_cache_init(void **state)
+{
+	struct test_ctx *test_ctx = talloc_get_type_abort(
+		*state,
+		struct test_ctx);
+	struct ldb_module *module = NULL;
+	struct ldb_kv_private *ldb_kv = NULL;
+	int ret = LDB_SUCCESS;
+
+	module = talloc_zero(test_ctx, struct ldb_module);
+	ldb_kv = talloc_zero(test_ctx, struct ldb_kv_private);
+	ldb_module_set_private(module, ldb_kv);
+
+	ret = ldb_kv_index_transaction_start(module, 191);
+	assert_int_equal(LDB_SUCCESS, ret);
+
+	assert_non_null(ldb_kv->idxptr);
+	assert_non_null(ldb_kv->idxptr->itdb);
+	assert_int_equal(191, tdb_hash_size(ldb_kv->idxptr->itdb));
+
+	TALLOC_FREE(ldb_kv);
+	TALLOC_FREE(module);
+}
+
+int main(int argc, const char **argv)
+{
+	const struct CMUnitTest tests[] = {
+		cmocka_unit_test_setup_teardown(
+			test_index_cache_init,
+			setup,
+			teardown),
+	};
+
+	return cmocka_run_group_tests(tests, NULL, NULL);
+}
diff --git a/lib/ldb/wscript b/lib/ldb/wscript
index 7891693a0fd..0fdd888f533 100644
--- a/lib/ldb/wscript
+++ b/lib/ldb/wscript
@@ -501,6 +501,11 @@ def build(bld):
                          deps='cmocka ldb',
                          install=False)
 
+        bld.SAMBA_BINARY('ldb_key_value_test',
+                         source='tests/ldb_key_value_test.c',
+                         deps='cmocka ldb ldb_tdb_err_map',
+                         install=False)
+
         if bld.CONFIG_SET('HAVE_LMDB'):
             bld.SAMBA_BINARY('ldb_mdb_mod_op_test',
                              source='tests/ldb_mod_op_test.c',
@@ -568,7 +573,8 @@ def test(ctx):
                  'ldb_msg_test',
                  'ldb_tdb_kv_ops_test',
                  'ldb_tdb_test',
-                 'ldb_match_test']
+                 'ldb_match_test',
+                 'ldb_key_value_test']
 
     if env.HAVE_LMDB:
         test_exes += ['ldb_mdb_mod_op_test',
-- 
2.18.1


From 6842ca7cbabd8353cd7e2fca314562e93f62cff9 Mon Sep 17 00:00:00 2001
From: Gary Lockyer <gary at catalyst.net.nz>
Date: Mon, 1 Apr 2019 15:27:32 +1300
Subject: [PATCH 3/6] lib ldb key_value: Add get_size method

Add the get_size method to the ldb_key_value layer, this will allow the
reindexing code to get an estimate of the number of records in the
database.

The lmdb backend returns an accurate count of the number of records in
the database withe the mdb_env_stat call.

The tdb backend does not provide a low cost method to determine the
number of records on the database.  It does provide a tdb_summary call
however this this walks the entire database.

So for tdb we use the map size divided by 500, this over estimates the counts
for small domains, but the extra memory allocated for the cache should
not be significant.

Signed-off-by: Gary Lockyer <gary at catalyst.net.nz>
---
 lib/ldb/ldb_key_value/ldb_kv.h  |  1 +
 lib/ldb/ldb_mdb/ldb_mdb.c       | 20 +++++++++
 lib/ldb/ldb_tdb/ldb_tdb.c       | 18 +++++++++
 lib/ldb/tests/ldb_kv_ops_test.c | 72 +++++++++++++++++++++++++++++++++
 lib/ldb/wscript                 |  2 +-
 5 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/lib/ldb/ldb_key_value/ldb_kv.h b/lib/ldb/ldb_key_value/ldb_kv.h
index 2c08b1db480..2eb6fdfed8b 100644
--- a/lib/ldb/ldb_key_value/ldb_kv.h
+++ b/lib/ldb/ldb_key_value/ldb_kv.h
@@ -43,6 +43,7 @@ struct kv_db_ops {
 	const char *(*name)(struct ldb_kv_private *ldb_kv);
 	bool (*has_changed)(struct ldb_kv_private *ldb_kv);
 	bool (*transaction_active)(struct ldb_kv_private *ldb_kv);
+	size_t (*get_size)(struct ldb_kv_private *ldb_kv);
 };
 
 /* this private structure is used by the key value backends in the
diff --git a/lib/ldb/ldb_mdb/ldb_mdb.c b/lib/ldb/ldb_mdb/ldb_mdb.c
index 646a67c554c..133155062c5 100644
--- a/lib/ldb/ldb_mdb/ldb_mdb.c
+++ b/lib/ldb/ldb_mdb/ldb_mdb.c
@@ -576,6 +576,25 @@ static bool lmdb_changed(struct ldb_kv_private *ldb_kv)
 	return true;
 }
 
+/*
+ * Get the number of records in the database.
+ *
+ * The mdb_env_stat call returns an accurate count, so we return the actual
+ * number of records in the database rather than an estimate.
+ */
+static size_t lmdb_get_size(struct ldb_kv_private *ldb_kv) {
+
+	struct MDB_stat stats = {0};
+	struct lmdb_private *lmdb = ldb_kv->lmdb_private;
+
+	if (mdb_env_stat(lmdb->env, &stats) != 0) {
+		return 0;
+	}
+	return stats.ms_entries;
+}
+
+
+
 static struct kv_db_ops lmdb_key_value_ops = {
 	.store              = lmdb_store,
 	.delete             = lmdb_delete,
@@ -593,6 +612,7 @@ static struct kv_db_ops lmdb_key_value_ops = {
 	.name               = lmdb_name,
 	.has_changed        = lmdb_changed,
 	.transaction_active = lmdb_transaction_active,
+	.get_size           = lmdb_get_size,
 };
 
 static const char *lmdb_get_path(const char *url)
diff --git a/lib/ldb/ldb_tdb/ldb_tdb.c b/lib/ldb/ldb_tdb/ldb_tdb.c
index 812ddd3e389..9f16040659b 100644
--- a/lib/ldb/ldb_tdb/ldb_tdb.c
+++ b/lib/ldb/ldb_tdb/ldb_tdb.c
@@ -400,6 +400,23 @@ static bool ltdb_transaction_active(struct ldb_kv_private *ldb_kv)
 	return tdb_transaction_active(ldb_kv->tdb);
 }
 
+/*
+ * Get an estimate of the number of records in a tdb database.
+ *
+ * This implementation will overestimate the number of records in a sparsely
+ * populated database. The size estimate is only used for allocating
+ * an in memory tdb to cache index records during a reindex, overestimating
+ * the contents is acceptable, and preferable to underestimating
+ */
+#define RECORD_SIZE 500
+static size_t ltdb_get_size(struct ldb_kv_private *ldb_kv)
+{
+	size_t map_size = tdb_map_size(ldb_kv->tdb);
+	size_t size = map_size / RECORD_SIZE;
+
+	return size;
+}
+
 static const struct kv_db_ops key_value_ops = {
     .store = ltdb_store,
     .delete = ltdb_delete,
@@ -417,6 +434,7 @@ static const struct kv_db_ops key_value_ops = {
     .name = ltdb_name,
     .has_changed = ltdb_changed,
     .transaction_active = ltdb_transaction_active,
+    .get_size = ltdb_get_size,
 };
 
 /*
diff --git a/lib/ldb/tests/ldb_kv_ops_test.c b/lib/ldb/tests/ldb_kv_ops_test.c
index d6a4dc058e5..9fa78f9ae5a 100644
--- a/lib/ldb/tests/ldb_kv_ops_test.c
+++ b/lib/ldb/tests/ldb_kv_ops_test.c
@@ -43,6 +43,9 @@
  * - supports iteration over all records in the database
  * - supports the update_in_iterate operation allowing entries to be
  *   re-keyed.
+ * - has a get_size implementation that returns an estimate of the number of
+ *   records in the database.  Note that this can be an estimate rather than
+ *   an accurate size.
  */
 #include <stdarg.h>
 #include <stddef.h>
@@ -1523,6 +1526,71 @@ static void test_delete_transaction_isolation(void **state)
 }
 
 
+/*
+ * Test that get_size returns a sensible estimate of the number of records
+ * in the database.
+ */
+static void test_get_size(void **state)
+{
+	int ret;
+	struct test_ctx *test_ctx = talloc_get_type_abort(*state,
+							  struct test_ctx);
+	struct ldb_kv_private *ldb_kv = get_ldb_kv(test_ctx->ldb);
+	uint8_t key_val[] = "TheKey";
+	struct ldb_val key = {
+		.data   = key_val,
+		.length = sizeof(key_val)
+	};
+
+	uint8_t value[] = "The record contents";
+	struct ldb_val data = {
+		.data    = value,
+		.length = sizeof(value)
+	};
+	size_t size = 0;
+
+	int flags = 0;
+	TALLOC_CTX *tmp_ctx;
+
+	tmp_ctx = talloc_new(test_ctx);
+	assert_non_null(tmp_ctx);
+
+	size = ldb_kv->kv_ops->get_size(ldb_kv);
+#if defined(TEST_LMDB)
+	assert_int_equal(2, size);
+#else
+	/*
+	 * The tdb implementation of get_size over estimates for sparse files
+	 * which is perfectly acceptable for it's intended use.
+	 */
+	assert_true( size > 2500);
+#endif
+
+	/*
+	 * Begin a transaction
+	 */
+	ret = ldb_kv->kv_ops->begin_write(ldb_kv);
+	assert_int_equal(ret, 0);
+
+	/*
+	 * Write the record
+	 */
+	ret = ldb_kv->kv_ops->store(ldb_kv, key, data, flags);
+	assert_int_equal(ret, 0);
+
+	/*
+	 * Commit the transaction
+	 */
+	ret = ldb_kv->kv_ops->finish_write(ldb_kv);
+	assert_int_equal(ret, 0);
+
+	size = ldb_kv->kv_ops->get_size(ldb_kv);
+#ifdef TEST_LMDB
+	assert_int_equal(3, size);
+#endif
+	talloc_free(tmp_ctx);
+}
+
 int main(int argc, const char **argv)
 {
 	const struct CMUnitTest tests[] = {
@@ -1570,6 +1638,10 @@ int main(int argc, const char **argv)
 			test_delete_transaction_isolation,
 			setup,
 			teardown),
+		cmocka_unit_test_setup_teardown(
+			test_get_size,
+			setup,
+			teardown),
 	};
 
 	return cmocka_run_group_tests(tests, NULL, NULL);
diff --git a/lib/ldb/wscript b/lib/ldb/wscript
index 0fdd888f533..743dae18357 100644
--- a/lib/ldb/wscript
+++ b/lib/ldb/wscript
@@ -526,7 +526,7 @@ def build(bld):
 
             bld.SAMBA_BINARY('ldb_mdb_kv_ops_test',
                              source='tests/ldb_kv_ops_test.c',
-                             cflags='-DTEST_BE=\"mdb\"',
+                             cflags='-DTEST_BE=\"mdb\" -DTEST_LMDB=1',
                              deps='cmocka ldb',
                              install=False)
         else:
-- 
2.18.1


From faea66580a49db22376c3942c3dca948d63c8a4c Mon Sep 17 00:00:00 2001
From: Gary Lockyer <gary at catalyst.net.nz>
Date: Mon, 1 Apr 2019 15:28:31 +1300
Subject: [PATCH 4/6] lib ldb key_value: set the cache size for re-indexing

Set the index cache size to the number of records in the databse when
reindexing.

This significantly improves reindex performance.  For a domain with
100,000 users the reindex times are reduced from 17 minutes to 45
seconds.

Signed-off-by: Gary Lockyer <gary at catalyst.net.nz>
---
 lib/ldb/ldb_key_value/ldb_kv_index.c |  12 ++-
 lib/ldb/tests/ldb_key_value_test.c   | 109 ++++++++++++++++++---------
 2 files changed, 83 insertions(+), 38 deletions(-)

diff --git a/lib/ldb/ldb_key_value/ldb_kv_index.c b/lib/ldb/ldb_key_value/ldb_kv_index.c
index 003ed714df9..4101d4b986d 100644
--- a/lib/ldb/ldb_key_value/ldb_kv_index.c
+++ b/lib/ldb/ldb_key_value/ldb_kv_index.c
@@ -3092,6 +3092,7 @@ int ldb_kv_reindex(struct ldb_module *module)
 	    ldb_module_get_private(module), struct ldb_kv_private);
 	int ret;
 	struct ldb_kv_reindex_context ctx;
+	size_t index_cache_size = 0;
 
 	/*
 	 * Only triggered after a modification, but make clear we do
@@ -3112,7 +3113,16 @@ int ldb_kv_reindex(struct ldb_module *module)
 	 */
 	ldb_kv_index_transaction_cancel(module);
 
-	ret = ldb_kv_index_transaction_start(module, DEFAULT_INDEX_CACHE_SIZE);
+	/*
+	 * Calculate the size of the index cache that we'll need for
+	 * the re-index
+	 */
+	index_cache_size = ldb_kv->kv_ops->get_size(ldb_kv);
+	if (index_cache_size < DEFAULT_INDEX_CACHE_SIZE) {
+		index_cache_size = DEFAULT_INDEX_CACHE_SIZE;
+	}
+
+	ret = ldb_kv_index_transaction_start(module, index_cache_size);
 	if (ret != LDB_SUCCESS) {
 		return ret;
 	}
diff --git a/lib/ldb/tests/ldb_key_value_test.c b/lib/ldb/tests/ldb_key_value_test.c
index ba632961768..1bb5623082f 100644
--- a/lib/ldb/tests/ldb_key_value_test.c
+++ b/lib/ldb/tests/ldb_key_value_test.c
@@ -52,7 +52,6 @@
 #include <sys/wait.h>
 
 #include "ldb_key_value/ldb_kv.c"
-#include "ldb_key_value/ldb_kv_cache.c"
 #include "ldb_key_value/ldb_kv_index.c"
 #include "ldb_key_value/ldb_kv_search.c"
 
@@ -63,50 +62,27 @@
 #endif /* TEST_BE */
 
 #define NUM_RECS 1024
-
+int ldb_kv_cache_reload(struct ldb_module *module) {
+	return LDB_SUCCESS;
+}
+int ldb_kv_cache_load(struct ldb_module *module) {
+	return LDB_SUCCESS;
+}
+int ldb_kv_check_at_attributes_values(const struct ldb_val *value) {
+	return LDB_SUCCESS;
+}
+int ldb_kv_increase_sequence_number(struct ldb_module *module) {
+	return LDB_SUCCESS;
+}
 
 struct test_ctx {
-	const char *dbfile;
-	const char *lockfile;   /* lockfile is separate */
-	const char *dbpath;
 };
 
-static void unlink_old_db(struct test_ctx *test_ctx)
-{
-	int ret;
-
-	errno = 0;
-	ret = unlink(test_ctx->lockfile);
-	if (ret == -1 && errno != ENOENT) {
-		fail();
-	}
-
-	errno = 0;
-	ret = unlink(test_ctx->dbfile);
-	if (ret == -1 && errno != ENOENT) {
-		fail();
-	}
-}
-
 static int setup(void **state)
 {
 	struct test_ctx *test_ctx;
 
 	test_ctx = talloc_zero(NULL, struct test_ctx);
-	assert_non_null(test_ctx);
-
-	test_ctx->dbfile = talloc_strdup(test_ctx, "kvopstest.ldb");
-	assert_non_null(test_ctx->dbfile);
-
-	test_ctx->lockfile = talloc_asprintf(test_ctx, "%s-lock",
-					     test_ctx->dbfile);
-	assert_non_null(test_ctx->lockfile);
-
-	test_ctx->dbpath = talloc_asprintf(test_ctx,
-			TEST_BE"://%s", test_ctx->dbfile);
-	assert_non_null(test_ctx->dbpath);
-
-	unlink_old_db(test_ctx);
 	*state = test_ctx;
 	return 0;
 }
@@ -116,7 +92,6 @@ static int teardown(void **state)
 	struct test_ctx *test_ctx = talloc_get_type_abort(*state,
 							  struct test_ctx);
 
-	unlink_old_db(test_ctx);
 	talloc_free(test_ctx);
 	return 0;
 }
@@ -149,6 +124,62 @@ static void test_index_cache_init(void **state)
 	TALLOC_FREE(module);
 }
 
+static int db_size = 0;
+static size_t mock_get_size(struct ldb_kv_private *ldb_kv) {
+	return db_size;
+}
+
+static int mock_iterate(
+	struct ldb_kv_private *ldb_kv,
+	ldb_kv_traverse_fn fn,
+	void *ctx) {
+	return 1;
+}
+/*
+ * Test that the index cache is correctly sized by the re_index call
+ */
+static void test_reindex_cache_size(void **state)
+{
+	struct test_ctx *test_ctx = talloc_get_type_abort(
+		*state,
+		struct test_ctx);
+	struct ldb_module *module = NULL;
+	struct ldb_kv_private *ldb_kv = NULL;
+	int ret = LDB_SUCCESS;
+	const struct kv_db_ops ops = {
+		.iterate = mock_iterate,
+		.get_size = mock_get_size,
+	};
+
+	module = talloc_zero(test_ctx, struct ldb_module);
+	ldb_kv = talloc_zero(test_ctx, struct ldb_kv_private);
+	ldb_kv->kv_ops = &ops;
+	ldb_module_set_private(module, ldb_kv);
+
+	/*
+	 * Use a value less than the DEFAULT_INDEX_CACHE_SIZE
+	 * Should get the DEFAULT_INDEX_CACHE_SIZE
+	 */
+	db_size = DEFAULT_INDEX_CACHE_SIZE - 1;
+	ret = ldb_kv_reindex(module);
+	assert_int_equal(LDB_SUCCESS, ret);
+
+	assert_int_equal(
+		DEFAULT_INDEX_CACHE_SIZE,
+		tdb_hash_size(ldb_kv->idxptr->itdb));
+
+	/*
+	 * Use a value greater than the DEFAULT_INDEX_CACHE_SIZE
+	 * Should get the value specified.
+	 */
+	db_size = DEFAULT_INDEX_CACHE_SIZE + 1;
+	ret = ldb_kv_reindex(module);
+	assert_int_equal(LDB_SUCCESS, ret);
+
+	assert_int_equal(db_size, tdb_hash_size(ldb_kv->idxptr->itdb));
+	TALLOC_FREE(ldb_kv);
+	TALLOC_FREE(module);
+}
 int main(int argc, const char **argv)
 {
 	const struct CMUnitTest tests[] = {
@@ -156,6 +187,10 @@ int main(int argc, const char **argv)
 			test_index_cache_init,
 			setup,
 			teardown),
+		cmocka_unit_test_setup_teardown(
+			test_reindex_cache_size,
+			setup,
+			teardown),
 	};
 
 	return cmocka_run_group_tests(tests, NULL, NULL);
-- 
2.18.1


From 57f0841994bc0a050d869d2456371875e76ead57 Mon Sep 17 00:00:00 2001
From: Gary Lockyer <gary at catalyst.net.nz>
Date: Mon, 1 Apr 2019 16:33:52 +1300
Subject: [PATCH 5/6] lib ldb key_value: Set index cache size on open

Set the default index cache from the passed option
"transaction_index_cache_size" on open.  This allows the default cache
size to be overridden when processing large transactions i.e. joining a
large domain.

Signed-off-by: Gary Lockyer <gary at catalyst.net.nz>
---
 lib/ldb/ldb_key_value/ldb_kv.c | 24 +++++++++++++++++++++++-
 lib/ldb/ldb_key_value/ldb_kv.h |  5 +++++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/lib/ldb/ldb_key_value/ldb_kv.c b/lib/ldb/ldb_key_value/ldb_kv.c
index 8247ada4256..bc240ac5e41 100644
--- a/lib/ldb/ldb_key_value/ldb_kv.c
+++ b/lib/ldb/ldb_key_value/ldb_kv.c
@@ -1382,7 +1382,9 @@ static int ldb_kv_start_trans(struct ldb_module *module)
 		return ldb_kv->kv_ops->error(ldb_kv);
 	}
 
-	ldb_kv_index_transaction_start(module, DEFAULT_INDEX_CACHE_SIZE);
+	ldb_kv_index_transaction_start(
+		module,
+		ldb_kv->index_transaction_cache_size);
 
 	ldb_kv->reindex_failed = false;
 
@@ -1946,5 +1948,25 @@ int ldb_kv_init_store(struct ldb_kv_private *ldb_kv,
 		}
 	}
 
+	/*
+	 * Set the size of the transaction index cache.
+	 * If the ldb option "transaction_index_cache_size" is set use that
+	 * otherwise use DEFAULT_INDEX_CACHE_SIZE
+	 */
+	ldb_kv->index_transaction_cache_size = DEFAULT_INDEX_CACHE_SIZE;
+	{
+		const char *size = ldb_options_find(
+			ldb,
+			options,
+			"transaction_index_cache_size");
+		if (size != NULL) {
+			size_t cache_size = strtoul(
+				size,
+				NULL,
+				DEFAULT_INDEX_CACHE_SIZE);
+			ldb_kv->index_transaction_cache_size = cache_size;
+		}
+	}
+
 	return LDB_SUCCESS;
 }
diff --git a/lib/ldb/ldb_key_value/ldb_kv.h b/lib/ldb/ldb_key_value/ldb_kv.h
index 2eb6fdfed8b..a4aa5ed9e62 100644
--- a/lib/ldb/ldb_key_value/ldb_kv.h
+++ b/lib/ldb/ldb_key_value/ldb_kv.h
@@ -103,6 +103,11 @@ struct ldb_kv_private {
 	 * fork()ed child.
 	 */
 	pid_t pid;
+
+	/*
+	 * The size to be used for the index transaction cache
+	 */
+	size_t index_transaction_cache_size;
 };
 
 struct ldb_kv_context {
-- 
2.18.1


From b25af4636cdc3f74cf96a8941ae7f0163d804d88 Mon Sep 17 00:00:00 2001
From: Gary Lockyer <gary at catalyst.net.nz>
Date: Mon, 1 Apr 2019 16:49:38 +1300
Subject: [PATCH 6/6] python join: Set index transaction cache size.

The default value is too small for joining a large domain.  So we specify a
size of 200,000 which is suitable for domains with up to 100,000 users.

At a later date this could be added as a parameter to the join, but
200,000 should be suitable for now.

Signed-off-by: Gary Lockyer <gary at catalyst.net.nz>
---
 python/samba/join.py | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/python/samba/join.py b/python/samba/join.py
index da8dcb050d3..46491122319 100644
--- a/python/samba/join.py
+++ b/python/samba/join.py
@@ -888,8 +888,13 @@ class DCJoinContext(object):
 
         # we now operate exclusively on the local database, which
         # we need to reopen in order to get the newly created schema
+        # we set the transaction_index_cache_size to 200,000 to ensure it is
+        # not too small, if it's too small the performance of the join will
+        # be negatively impacted.
         print("Reconnecting to local samdb")
         ctx.samdb = SamDB(url=ctx.local_samdb.url,
+                         options=[
+                             "transaction_index_cache_size:200000"],
                           session_info=system_session(),
                           lp=ctx.local_samdb.lp,
                           global_schema=False)
-- 
2.18.1

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20190403/da4d0c08/signature.sig>


More information about the samba-technical mailing list