[SCM] Samba Shared Repository - branch master updated
Michael Adam
obnox at samba.org
Tue Mar 18 08:43:04 MDT 2014
The branch, master has been updated
via 8278d38 tdb: change version to 1.2.13.
via 3034a5a tdb: Reduce freelist contention
via 1461362 tdb: Make "tdb_purge_dead" internally public
via 92ce9fd tdb: Make "tdb_find_dead" internally public
via 4ca0186 tdb: Add "last_ptr" to tdb_find_dead
via cb09d79 tdb: Move adding tailer space to tdb_find_dead
via 255edd1 tdb: Do a best fit search for dead records
via d1ce011 tdb: Don't purge records to a blocked freelist
via 5f7b481 tdb: Fix a tdb corruption
from 9c9df40 dsdb: Further assert that we always have an objectClass and an rDN
http://gitweb.samba.org/?p=samba.git;a=shortlog;h=master
- Log -----------------------------------------------------------------
commit 8278d3823aac83bc5edb14353c8de772878ae915
Author: Michael Adam <obnox at samba.org>
Date: Tue Mar 18 13:05:42 2014 +0100
tdb: change version to 1.2.13.
* internal code cleanups
* always open internal TDBs with incompatible hash
* avoid reallocations in locking code
* systematize output format in tdbtool dump
* reduce freelist contention when allocating new records
- try to find dead records also in other chains
- don't do blocking locks on the freelist
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-by: Stefan Metzmacher <metze at samba.org>
Autobuild-User(master): Michael Adam <obnox at samba.org>
Autobuild-Date(master): Tue Mar 18 15:42:48 CET 2014 on sn-devel-104
commit 3034a5a62b8eaac7bdc52bfb7af1beb29e888b34
Author: Volker Lendecke <vl at samba.org>
Date: Tue Mar 18 08:04:42 2014 +0100
tdb: Reduce freelist contention
In a metadata-intensive benchmark we have seen the locking.tdb freelist to be
one of the central contention points. This patch removes most of the contention
on the freelist. Ages ago we already reduced freelist contention by using the
even much older DEAD records: If TDB_VOLATILE is set, don't directly put
deleted records on the freelist, but just mark a few of them just as DEAD. The
next new record can them re-use that space without consulting the freelist.
This patch builds upon the DEAD records: If we need space and the freelist is
busy, instead of doing a blocking wait on the freelist, start looking into
other chains for DEAD records and steal them from there. This way every hash
chain becomes a small freelist. Just wander around the hash chains as long as
the freelist is still busy.
With this patch and the tdb mutex patch (following hopefully some time soon)
you can see a heavily busy clustered smbd run without locking.tdb futex
syscalls.
Signed-off-by: Volker Lendecke <vl at samba.org>
Reviewed-by: Michael Adam <obnox at samba.org>
Reviewed-by: Stefan Metzmacher <metze at samba.org>
commit 1461362e936e5beebeaae1555cf96f6731287c35
Author: Volker Lendecke <vl at samba.org>
Date: Tue Mar 18 08:03:16 2014 +0100
tdb: Make "tdb_purge_dead" internally public
Signed-off-by: Volker Lendecke <vl at samba.org>
Reviewed-by: Michael Adam <obnox at samba.org>
Reviewed-by: Stefan Metzmacher <metze at samba.org>
commit 92ce9fd9afd080954c0509cf62def6b355d79e94
Author: Volker Lendecke <vl at samba.org>
Date: Tue Mar 18 08:01:40 2014 +0100
tdb: Make "tdb_find_dead" internally public
Signed-off-by: Volker Lendecke <vl at samba.org>
Reviewed-by: Michael Adam <obnox at samba.org>
Reviewed-by: Stefan Metzmacher <metze at samba.org>
commit 4ca018692f1bd9fe85b6d8be546bbaf704ba038d
Author: Volker Lendecke <vl at samba.org>
Date: Tue Mar 18 08:00:45 2014 +0100
tdb: Add "last_ptr" to tdb_find_dead
Will be used soon to unlink a dead record from a chain
Signed-off-by: Volker Lendecke <vl at samba.org>
Reviewed-by: Michael Adam <obnox at samba.org>
Reviewed-by: Stefan Metzmacher <metze at samba.org>
commit cb09d7937c93a6cdf855b84bd5f58f30a46cbfc7
Author: Volker Lendecke <vl at samba.org>
Date: Tue Mar 18 07:52:59 2014 +0100
tdb: Move adding tailer space to tdb_find_dead
This aligns the tdb_find_dead API with the tdb_allocate API and thus makes it a
bit easier to understand, at least for me.
Signed-off-by: Volker Lendecke <vl at samba.org>
Reviewed-by: Michael Adam <obnox at samba.org>
Reviewed-by: Stefan Metzmacher <metze at samba.org>
commit 255edd1b417480ab033c51165702c19fb5fff56f
Author: Volker Lendecke <vl at samba.org>
Date: Tue Mar 18 07:46:38 2014 +0100
tdb: Do a best fit search for dead records
Hash chains are (or can be made) short enough that a full search for the
best-fitting dead record is feasible. The freelist can become much longer,
there we don't do the full search but accept records which are too large.
Signed-off-by: Volker Lendecke <vl at samba.org>
Reviewed-by: Michael Adam <obnox at samba.org>
Reviewed-by: Stefan Metzmacher <metze at samba.org>
commit d1ce0110f0a1666df4d1eb81e76631a90e9e5a73
Author: Volker Lendecke <vl at samba.org>
Date: Mon Mar 17 06:47:11 2014 +0100
tdb: Don't purge records to a blocked freelist
If the freelist is heavily contended, we should avoid accessing it
Signed-off-by: Volker Lendecke <vl at samba.org>
Reviewed-by: Michael Adam <obnox at samba.org>
Reviewed-by: Stefan Metzmacher <metze at samba.org>
commit 5f7b481349796cc0e90563ed01353809b403e429
Author: Volker Lendecke <vl at samba.org>
Date: Sun Mar 16 20:08:32 2014 +0000
tdb: Fix a tdb corruption
tdb_purge_dead can change the next pointer of "rec" if we purge the record
right behind the current record to be deleted. Just overwrite the magic,
not the whole record with stale data.
Signed-off-by: Volker Lendecke <vl at samba.org>
Reviewed-by: Michael Adam <obnox at samba.org>
Reviewed-by: Stefan Metzmacher <metze at samba.org>
-----------------------------------------------------------------------
Summary of changes:
.../tdb-1.2.11.sigs => lib/tdb/ABI/tdb-1.2.13.sigs | 0
lib/tdb/common/freelist.c | 100 ++++++++++++++++++--
lib/tdb/common/tdb.c | 78 ++++++++-------
lib/tdb/common/tdb_private.h | 7 +-
lib/tdb/wscript | 2 +-
5 files changed, 138 insertions(+), 49 deletions(-)
copy ctdb/lib/tdb/ABI/tdb-1.2.11.sigs => lib/tdb/ABI/tdb-1.2.13.sigs (100%)
Changeset truncated at 500 lines:
diff --git a/ctdb/lib/tdb/ABI/tdb-1.2.11.sigs b/lib/tdb/ABI/tdb-1.2.13.sigs
similarity index 100%
copy from ctdb/lib/tdb/ABI/tdb-1.2.11.sigs
copy to lib/tdb/ABI/tdb-1.2.13.sigs
diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index ea14dd0..6f8f812 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -273,7 +273,8 @@ static tdb_off_t tdb_allocate_ofs(struct tdb_context *tdb,
0 is returned if the space could not be allocated
*/
-tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_record *rec)
+static tdb_off_t tdb_allocate_from_freelist(
+ struct tdb_context *tdb, tdb_len_t length, struct tdb_record *rec)
{
tdb_off_t rec_ptr, last_ptr, newrec_ptr;
struct {
@@ -282,9 +283,6 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec
} bestfit;
float multiplier = 1.0;
- if (tdb_lock(tdb, -1, F_WRLCK) == -1)
- return 0;
-
/* over-allocate to reduce fragmentation */
length *= 1.25;
@@ -297,7 +295,7 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec
/* read in the freelist top */
if (tdb_ofs_read(tdb, FREELIST_TOP, &rec_ptr) == -1)
- goto fail;
+ return 0;
bestfit.rec_ptr = 0;
bestfit.last_ptr = 0;
@@ -310,7 +308,7 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec
*/
while (rec_ptr) {
if (tdb_rec_free_read(tdb, rec_ptr, rec) == -1) {
- goto fail;
+ return 0;
}
if (rec->rec_len >= length) {
@@ -344,12 +342,11 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec
if (bestfit.rec_ptr != 0) {
if (tdb_rec_free_read(tdb, bestfit.rec_ptr, rec) == -1) {
- goto fail;
+ return 0;
}
newrec_ptr = tdb_allocate_ofs(tdb, length, bestfit.rec_ptr,
rec, bestfit.last_ptr);
- tdb_unlock(tdb, -1, F_WRLCK);
return newrec_ptr;
}
@@ -357,12 +354,95 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec
database and if we can then try again */
if (tdb_expand(tdb, length + sizeof(*rec)) == 0)
goto again;
- fail:
- tdb_unlock(tdb, -1, F_WRLCK);
+
return 0;
}
+static bool tdb_alloc_dead(
+ struct tdb_context *tdb, int hash, tdb_len_t length,
+ tdb_off_t *rec_ptr, struct tdb_record *rec)
+{
+ tdb_off_t last_ptr;
+
+ *rec_ptr = tdb_find_dead(tdb, hash, rec, length, &last_ptr);
+ if (*rec_ptr == 0) {
+ return false;
+ }
+ /*
+ * Unlink the record from the hash chain, it's about to be moved into
+ * another one.
+ */
+ return (tdb_ofs_write(tdb, last_ptr, &rec->next) == 0);
+}
+
+/*
+ * Chain "hash" is assumed to be locked
+ */
+
+tdb_off_t tdb_allocate(struct tdb_context *tdb, int hash, tdb_len_t length,
+ struct tdb_record *rec)
+{
+ tdb_off_t ret;
+ int i;
+
+ if (tdb->max_dead_records == 0) {
+ /*
+ * No dead records to expect anywhere. Do the blocking
+ * freelist lock without trying to steal from others
+ */
+ goto blocking_freelist_allocate;
+ }
+
+ /*
+ * The following loop tries to get the freelist lock nonblocking. If
+ * it gets the lock, allocate from there. If the freelist is busy,
+ * instead of waiting we try to steal dead records from other hash
+ * chains.
+ *
+ * Be aware that we do nonblocking locks on the other hash chains as
+ * well and fail gracefully. This way we avoid deadlocks (we block two
+ * hash chains, something which is pretty bad normally)
+ */
+
+ for (i=1; i<tdb->hash_size; i++) {
+
+ int list;
+
+ if (tdb_lock_nonblock(tdb, -1, F_WRLCK) == 0) {
+ /*
+ * Under the freelist lock take the chance to give
+ * back our dead records.
+ */
+ tdb_purge_dead(tdb, hash);
+
+ ret = tdb_allocate_from_freelist(tdb, length, rec);
+ tdb_unlock(tdb, -1, F_WRLCK);
+ return ret;
+ }
+
+ list = BUCKET(hash+i);
+
+ if (tdb_lock_nonblock(tdb, list, F_WRLCK) == 0) {
+ bool got_dead;
+ got_dead = tdb_alloc_dead(tdb, list, length, &ret, rec);
+ tdb_unlock(tdb, list, F_WRLCK);
+
+ if (got_dead) {
+ return ret;
+ }
+ }
+ }
+
+blocking_freelist_allocate:
+
+ if (tdb_lock(tdb, -1, F_WRLCK) == -1) {
+ return 0;
+ }
+ ret = tdb_allocate_from_freelist(tdb, length, rec);
+ tdb_unlock(tdb, -1, F_WRLCK);
+ return ret;
+}
/*
return the size of the freelist - used to decide if we should repack
diff --git a/lib/tdb/common/tdb.c b/lib/tdb/common/tdb.c
index 1e41e84..ba1c98e 100644
--- a/lib/tdb/common/tdb.c
+++ b/lib/tdb/common/tdb.c
@@ -345,13 +345,16 @@ static int tdb_count_dead(struct tdb_context *tdb, uint32_t hash)
/*
* Purge all DEAD records from a hash chain
*/
-static int tdb_purge_dead(struct tdb_context *tdb, uint32_t hash)
+int tdb_purge_dead(struct tdb_context *tdb, uint32_t hash)
{
int res = -1;
struct tdb_record rec;
tdb_off_t rec_ptr;
- if (tdb_lock(tdb, -1, F_WRLCK) == -1) {
+ if (tdb_lock_nonblock(tdb, -1, F_WRLCK) == -1) {
+ /*
+ * Don't block the freelist if not strictly necessary
+ */
return -1;
}
@@ -394,6 +397,8 @@ static int tdb_delete_hash(struct tdb_context *tdb, TDB_DATA key, uint32_t hash)
if (tdb->max_dead_records != 0) {
+ uint32_t magic = TDB_DEAD_MAGIC;
+
/*
* Allow for some dead records per hash chain, mainly for
* tdb's with a very high create/delete rate like locking.tdb.
@@ -410,8 +415,9 @@ static int tdb_delete_hash(struct tdb_context *tdb, TDB_DATA key, uint32_t hash)
/*
* Just mark the record as dead.
*/
- rec.magic = TDB_DEAD_MAGIC;
- ret = tdb_rec_write(tdb, rec_ptr, &rec);
+ ret = tdb_ofs_write(
+ tdb, rec_ptr + offsetof(struct tdb_record, magic),
+ &magic);
}
else {
ret = tdb_do_delete(tdb, rec_ptr, &rec);
@@ -439,13 +445,21 @@ _PUBLIC_ int tdb_delete(struct tdb_context *tdb, TDB_DATA key)
/*
* See if we have a dead record around with enough space
*/
-static tdb_off_t tdb_find_dead(struct tdb_context *tdb, uint32_t hash,
- struct tdb_record *r, tdb_len_t length)
+tdb_off_t tdb_find_dead(struct tdb_context *tdb, uint32_t hash,
+ struct tdb_record *r, tdb_len_t length,
+ tdb_off_t *p_last_ptr)
{
- tdb_off_t rec_ptr;
+ tdb_off_t rec_ptr, last_ptr;
+ tdb_off_t best_rec_ptr = 0;
+ tdb_off_t best_last_ptr = 0;
+ struct tdb_record best = { .rec_len = UINT32_MAX };
+
+ length += sizeof(tdb_off_t); /* tailer */
+
+ last_ptr = TDB_HASH_TOP(hash);
/* read in the hash top */
- if (tdb_ofs_read(tdb, TDB_HASH_TOP(hash), &rec_ptr) == -1)
+ if (tdb_ofs_read(tdb, last_ptr, &rec_ptr) == -1)
return 0;
/* keep looking until we find the right record */
@@ -453,16 +467,23 @@ static tdb_off_t tdb_find_dead(struct tdb_context *tdb, uint32_t hash,
if (tdb_rec_read(tdb, rec_ptr, r) == -1)
return 0;
- if (TDB_DEAD(r) && r->rec_len >= length) {
- /*
- * First fit for simple coding, TODO: change to best
- * fit
- */
- return rec_ptr;
+ if (TDB_DEAD(r) && (r->rec_len >= length) &&
+ (r->rec_len < best.rec_len)) {
+ best_rec_ptr = rec_ptr;
+ best_last_ptr = last_ptr;
+ best = *r;
}
+ last_ptr = rec_ptr;
rec_ptr = r->next;
}
- return 0;
+
+ if (best.rec_len == UINT32_MAX) {
+ return 0;
+ }
+
+ *r = best;
+ *p_last_ptr = best_last_ptr;
+ return best_rec_ptr;
}
static int _tdb_store(struct tdb_context *tdb, TDB_DATA key,
@@ -500,15 +521,16 @@ static int _tdb_store(struct tdb_context *tdb, TDB_DATA key,
tdb_delete_hash(tdb, key, hash);
if (tdb->max_dead_records != 0) {
+ tdb_off_t last_ptr;
/*
* Allow for some dead records per hash chain, look if we can
* find one that can hold the new record. We need enough space
* for key, data and tailer. If we find one, we don't have to
* consult the central freelist.
*/
- rec_ptr = tdb_find_dead(
- tdb, hash, &rec,
- key.dsize + dbuf.dsize + sizeof(tdb_off_t));
+ rec_ptr = tdb_find_dead(tdb, hash, &rec,
+ key.dsize + dbuf.dsize,
+ &last_ptr);
if (rec_ptr != 0) {
rec.key_len = key.dsize;
@@ -528,26 +550,8 @@ static int _tdb_store(struct tdb_context *tdb, TDB_DATA key,
}
}
- /*
- * We have to allocate some space from the freelist, so this means we
- * have to lock it. Use the chance to purge all the DEAD records from
- * the hash chain under the freelist lock.
- */
-
- if (tdb_lock(tdb, -1, F_WRLCK) == -1) {
- goto fail;
- }
-
- if ((tdb->max_dead_records != 0)
- && (tdb_purge_dead(tdb, hash) == -1)) {
- tdb_unlock(tdb, -1, F_WRLCK);
- goto fail;
- }
-
/* we have to allocate some space */
- rec_ptr = tdb_allocate(tdb, key.dsize + dbuf.dsize, &rec);
-
- tdb_unlock(tdb, -1, F_WRLCK);
+ rec_ptr = tdb_allocate(tdb, hash, key.dsize + dbuf.dsize, &rec);
if (rec_ptr == 0) {
goto fail;
diff --git a/lib/tdb/common/tdb_private.h b/lib/tdb/common/tdb_private.h
index 7227b43..a672159 100644
--- a/lib/tdb/common/tdb_private.h
+++ b/lib/tdb/common/tdb_private.h
@@ -255,7 +255,8 @@ int tdb_ofs_read(struct tdb_context *tdb, tdb_off_t offset, tdb_off_t *d);
int tdb_ofs_write(struct tdb_context *tdb, tdb_off_t offset, tdb_off_t *d);
void *tdb_convert(void *buf, uint32_t size);
int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec);
-tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_record *rec);
+tdb_off_t tdb_allocate(struct tdb_context *tdb, int hash, tdb_len_t length,
+ struct tdb_record *rec);
int tdb_ofs_read(struct tdb_context *tdb, tdb_off_t offset, tdb_off_t *d);
int tdb_ofs_write(struct tdb_context *tdb, tdb_off_t offset, tdb_off_t *d);
int tdb_lock_record(struct tdb_context *tdb, tdb_off_t off);
@@ -272,6 +273,10 @@ int tdb_parse_data(struct tdb_context *tdb, TDB_DATA key,
void *private_data);
tdb_off_t tdb_find_lock_hash(struct tdb_context *tdb, TDB_DATA key, uint32_t hash, int locktype,
struct tdb_record *rec);
+tdb_off_t tdb_find_dead(struct tdb_context *tdb, uint32_t hash,
+ struct tdb_record *r, tdb_len_t length,
+ tdb_off_t *p_last_ptr);
+int tdb_purge_dead(struct tdb_context *tdb, uint32_t hash);
void tdb_io_init(struct tdb_context *tdb);
int tdb_expand(struct tdb_context *tdb, tdb_off_t size);
tdb_off_t tdb_expand_adjust(tdb_off_t map_size, tdb_off_t size, int page_size);
diff --git a/lib/tdb/wscript b/lib/tdb/wscript
index 00a1c34..fc396d6 100644
--- a/lib/tdb/wscript
+++ b/lib/tdb/wscript
@@ -1,7 +1,7 @@
#!/usr/bin/env python
APPNAME = 'tdb'
-VERSION = '1.2.12'
+VERSION = '1.2.13'
blddir = 'bin'
--
Samba Shared Repository
More information about the samba-cvs
mailing list