[SCM] Samba Shared Repository - branch master updated

Michael Adam obnox at samba.org
Tue Mar 18 08:43:04 MDT 2014


The branch, master has been updated
       via  8278d38 tdb: change version to 1.2.13.
       via  3034a5a tdb: Reduce freelist contention
       via  1461362 tdb: Make "tdb_purge_dead" internally public
       via  92ce9fd tdb: Make "tdb_find_dead" internally public
       via  4ca0186 tdb: Add "last_ptr" to tdb_find_dead
       via  cb09d79 tdb: Move adding tailer space to tdb_find_dead
       via  255edd1 tdb: Do a best fit search for dead records
       via  d1ce011 tdb: Don't purge records to a blocked freelist
       via  5f7b481 tdb: Fix a tdb corruption
      from  9c9df40 dsdb: Further assert that we always have an objectClass and an rDN

http://gitweb.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 8278d3823aac83bc5edb14353c8de772878ae915
Author: Michael Adam <obnox at samba.org>
Date:   Tue Mar 18 13:05:42 2014 +0100

    tdb: change version to 1.2.13.
    
    * internal code cleanups
    * always open internal TDBs with incompatible hash
    * avoid reallocations in locking code
    * systematize output format in tdbtool dump
    * reduce freelist contention when allocating new records
      - try to find dead records also in other chains
      - don't do blocking locks on the freelist
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-by: Stefan Metzmacher <metze at samba.org>
    
    Autobuild-User(master): Michael Adam <obnox at samba.org>
    Autobuild-Date(master): Tue Mar 18 15:42:48 CET 2014 on sn-devel-104

commit 3034a5a62b8eaac7bdc52bfb7af1beb29e888b34
Author: Volker Lendecke <vl at samba.org>
Date:   Tue Mar 18 08:04:42 2014 +0100

    tdb: Reduce freelist contention
    
    In a metadata-intensive benchmark we have seen the locking.tdb freelist to be
    one of the central contention points. This patch removes most of the contention
    on the freelist. Ages ago we already reduced freelist contention by using the
    even much older DEAD records: If TDB_VOLATILE is set, don't directly put
    deleted records on the freelist, but just mark a few of them just as DEAD. The
    next new record can them re-use that space without consulting the freelist.
    
    This patch builds upon the DEAD records: If we need space and the freelist is
    busy, instead of doing a blocking wait on the freelist, start looking into
    other chains for DEAD records and steal them from there. This way every hash
    chain becomes a small freelist. Just wander around the hash chains as long as
    the freelist is still busy.
    
    With this patch and the tdb mutex patch (following hopefully some time soon)
    you can see a heavily busy clustered smbd run without locking.tdb futex
    syscalls.
    
    Signed-off-by: Volker Lendecke <vl at samba.org>
    Reviewed-by: Michael Adam <obnox at samba.org>
    Reviewed-by: Stefan Metzmacher <metze at samba.org>

commit 1461362e936e5beebeaae1555cf96f6731287c35
Author: Volker Lendecke <vl at samba.org>
Date:   Tue Mar 18 08:03:16 2014 +0100

    tdb: Make "tdb_purge_dead" internally public
    
    Signed-off-by: Volker Lendecke <vl at samba.org>
    Reviewed-by: Michael Adam <obnox at samba.org>
    Reviewed-by: Stefan Metzmacher <metze at samba.org>

commit 92ce9fd9afd080954c0509cf62def6b355d79e94
Author: Volker Lendecke <vl at samba.org>
Date:   Tue Mar 18 08:01:40 2014 +0100

    tdb: Make "tdb_find_dead" internally public
    
    Signed-off-by: Volker Lendecke <vl at samba.org>
    Reviewed-by: Michael Adam <obnox at samba.org>
    Reviewed-by: Stefan Metzmacher <metze at samba.org>

commit 4ca018692f1bd9fe85b6d8be546bbaf704ba038d
Author: Volker Lendecke <vl at samba.org>
Date:   Tue Mar 18 08:00:45 2014 +0100

    tdb: Add "last_ptr" to tdb_find_dead
    
    Will be used soon to unlink a dead record from a chain
    
    Signed-off-by: Volker Lendecke <vl at samba.org>
    Reviewed-by: Michael Adam <obnox at samba.org>
    Reviewed-by: Stefan Metzmacher <metze at samba.org>

commit cb09d7937c93a6cdf855b84bd5f58f30a46cbfc7
Author: Volker Lendecke <vl at samba.org>
Date:   Tue Mar 18 07:52:59 2014 +0100

    tdb: Move adding tailer space to tdb_find_dead
    
    This aligns the tdb_find_dead API with the tdb_allocate API and thus makes it a
    bit easier to understand, at least for me.
    
    Signed-off-by: Volker Lendecke <vl at samba.org>
    Reviewed-by: Michael Adam <obnox at samba.org>
    Reviewed-by: Stefan Metzmacher <metze at samba.org>

commit 255edd1b417480ab033c51165702c19fb5fff56f
Author: Volker Lendecke <vl at samba.org>
Date:   Tue Mar 18 07:46:38 2014 +0100

    tdb: Do a best fit search for dead records
    
    Hash chains are (or can be made) short enough that a full search for the
    best-fitting dead record is feasible. The freelist can become much longer,
    there we don't do the full search but accept records which are too large.
    
    Signed-off-by: Volker Lendecke <vl at samba.org>
    Reviewed-by: Michael Adam <obnox at samba.org>
    Reviewed-by: Stefan Metzmacher <metze at samba.org>

commit d1ce0110f0a1666df4d1eb81e76631a90e9e5a73
Author: Volker Lendecke <vl at samba.org>
Date:   Mon Mar 17 06:47:11 2014 +0100

    tdb: Don't purge records to a blocked freelist
    
    If the freelist is heavily contended, we should avoid accessing it
    
    Signed-off-by: Volker Lendecke <vl at samba.org>
    Reviewed-by: Michael Adam <obnox at samba.org>
    Reviewed-by: Stefan Metzmacher <metze at samba.org>

commit 5f7b481349796cc0e90563ed01353809b403e429
Author: Volker Lendecke <vl at samba.org>
Date:   Sun Mar 16 20:08:32 2014 +0000

    tdb: Fix a tdb corruption
    
    tdb_purge_dead can change the next pointer of "rec" if we purge the record
    right behind the current record to be deleted. Just overwrite the magic,
    not the whole record with stale data.
    
    Signed-off-by: Volker Lendecke <vl at samba.org>
    Reviewed-by: Michael Adam <obnox at samba.org>
    Reviewed-by: Stefan Metzmacher <metze at samba.org>

-----------------------------------------------------------------------

Summary of changes:
 .../tdb-1.2.11.sigs => lib/tdb/ABI/tdb-1.2.13.sigs |    0
 lib/tdb/common/freelist.c                          |  100 ++++++++++++++++++--
 lib/tdb/common/tdb.c                               |   78 ++++++++-------
 lib/tdb/common/tdb_private.h                       |    7 +-
 lib/tdb/wscript                                    |    2 +-
 5 files changed, 138 insertions(+), 49 deletions(-)
 copy ctdb/lib/tdb/ABI/tdb-1.2.11.sigs => lib/tdb/ABI/tdb-1.2.13.sigs (100%)


Changeset truncated at 500 lines:

diff --git a/ctdb/lib/tdb/ABI/tdb-1.2.11.sigs b/lib/tdb/ABI/tdb-1.2.13.sigs
similarity index 100%
copy from ctdb/lib/tdb/ABI/tdb-1.2.11.sigs
copy to lib/tdb/ABI/tdb-1.2.13.sigs
diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index ea14dd0..6f8f812 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -273,7 +273,8 @@ static tdb_off_t tdb_allocate_ofs(struct tdb_context *tdb,
 
    0 is returned if the space could not be allocated
  */
-tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_record *rec)
+static tdb_off_t tdb_allocate_from_freelist(
+	struct tdb_context *tdb, tdb_len_t length, struct tdb_record *rec)
 {
 	tdb_off_t rec_ptr, last_ptr, newrec_ptr;
 	struct {
@@ -282,9 +283,6 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec
 	} bestfit;
 	float multiplier = 1.0;
 
-	if (tdb_lock(tdb, -1, F_WRLCK) == -1)
-		return 0;
-
 	/* over-allocate to reduce fragmentation */
 	length *= 1.25;
 
@@ -297,7 +295,7 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec
 
 	/* read in the freelist top */
 	if (tdb_ofs_read(tdb, FREELIST_TOP, &rec_ptr) == -1)
-		goto fail;
+		return 0;
 
 	bestfit.rec_ptr = 0;
 	bestfit.last_ptr = 0;
@@ -310,7 +308,7 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec
 	 */
 	while (rec_ptr) {
 		if (tdb_rec_free_read(tdb, rec_ptr, rec) == -1) {
-			goto fail;
+			return 0;
 		}
 
 		if (rec->rec_len >= length) {
@@ -344,12 +342,11 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec
 
 	if (bestfit.rec_ptr != 0) {
 		if (tdb_rec_free_read(tdb, bestfit.rec_ptr, rec) == -1) {
-			goto fail;
+			return 0;
 		}
 
 		newrec_ptr = tdb_allocate_ofs(tdb, length, bestfit.rec_ptr,
 					      rec, bestfit.last_ptr);
-		tdb_unlock(tdb, -1, F_WRLCK);
 		return newrec_ptr;
 	}
 
@@ -357,12 +354,95 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_rec
 	   database and if we can then try again */
 	if (tdb_expand(tdb, length + sizeof(*rec)) == 0)
 		goto again;
- fail:
-	tdb_unlock(tdb, -1, F_WRLCK);
+
 	return 0;
 }
 
+static bool tdb_alloc_dead(
+	struct tdb_context *tdb, int hash, tdb_len_t length,
+	tdb_off_t *rec_ptr, struct tdb_record *rec)
+{
+	tdb_off_t last_ptr;
+
+	*rec_ptr = tdb_find_dead(tdb, hash, rec, length, &last_ptr);
+	if (*rec_ptr == 0) {
+		return false;
+	}
+	/*
+	 * Unlink the record from the hash chain, it's about to be moved into
+	 * another one.
+	 */
+	return (tdb_ofs_write(tdb, last_ptr, &rec->next) == 0);
+}
+
+/*
+ * Chain "hash" is assumed to be locked
+ */
+
+tdb_off_t tdb_allocate(struct tdb_context *tdb, int hash, tdb_len_t length,
+		       struct tdb_record *rec)
+{
+	tdb_off_t ret;
+	int i;
+
+	if (tdb->max_dead_records == 0) {
+		/*
+		 * No dead records to expect anywhere. Do the blocking
+		 * freelist lock without trying to steal from others
+		 */
+		goto blocking_freelist_allocate;
+	}
+
+	/*
+	 * The following loop tries to get the freelist lock nonblocking. If
+	 * it gets the lock, allocate from there. If the freelist is busy,
+	 * instead of waiting we try to steal dead records from other hash
+	 * chains.
+	 *
+	 * Be aware that we do nonblocking locks on the other hash chains as
+	 * well and fail gracefully. This way we avoid deadlocks (we block two
+	 * hash chains, something which is pretty bad normally)
+	 */
+
+	for (i=1; i<tdb->hash_size; i++) {
+
+		int list;
+
+		if (tdb_lock_nonblock(tdb, -1, F_WRLCK) == 0) {
+			/*
+			 * Under the freelist lock take the chance to give
+			 * back our dead records.
+			 */
+			tdb_purge_dead(tdb, hash);
+
+			ret = tdb_allocate_from_freelist(tdb, length, rec);
+			tdb_unlock(tdb, -1, F_WRLCK);
+			return ret;
+		}
+
+		list = BUCKET(hash+i);
+
+		if (tdb_lock_nonblock(tdb, list, F_WRLCK) == 0) {
+			bool got_dead;
 
+			got_dead = tdb_alloc_dead(tdb, list, length, &ret, rec);
+			tdb_unlock(tdb, list, F_WRLCK);
+
+			if (got_dead) {
+				return ret;
+			}
+		}
+	}
+
+blocking_freelist_allocate:
+
+	if (tdb_lock(tdb, -1, F_WRLCK) == -1) {
+		return 0;
+	}
+	ret = tdb_allocate_from_freelist(tdb, length, rec);
+	tdb_unlock(tdb, -1, F_WRLCK);
+	return ret;
+}
 
 /*
    return the size of the freelist - used to decide if we should repack
diff --git a/lib/tdb/common/tdb.c b/lib/tdb/common/tdb.c
index 1e41e84..ba1c98e 100644
--- a/lib/tdb/common/tdb.c
+++ b/lib/tdb/common/tdb.c
@@ -345,13 +345,16 @@ static int tdb_count_dead(struct tdb_context *tdb, uint32_t hash)
 /*
  * Purge all DEAD records from a hash chain
  */
-static int tdb_purge_dead(struct tdb_context *tdb, uint32_t hash)
+int tdb_purge_dead(struct tdb_context *tdb, uint32_t hash)
 {
 	int res = -1;
 	struct tdb_record rec;
 	tdb_off_t rec_ptr;
 
-	if (tdb_lock(tdb, -1, F_WRLCK) == -1) {
+	if (tdb_lock_nonblock(tdb, -1, F_WRLCK) == -1) {
+		/*
+		 * Don't block the freelist if not strictly necessary
+		 */
 		return -1;
 	}
 
@@ -394,6 +397,8 @@ static int tdb_delete_hash(struct tdb_context *tdb, TDB_DATA key, uint32_t hash)
 
 	if (tdb->max_dead_records != 0) {
 
+		uint32_t magic = TDB_DEAD_MAGIC;
+
 		/*
 		 * Allow for some dead records per hash chain, mainly for
 		 * tdb's with a very high create/delete rate like locking.tdb.
@@ -410,8 +415,9 @@ static int tdb_delete_hash(struct tdb_context *tdb, TDB_DATA key, uint32_t hash)
 		/*
 		 * Just mark the record as dead.
 		 */
-		rec.magic = TDB_DEAD_MAGIC;
-		ret = tdb_rec_write(tdb, rec_ptr, &rec);
+		ret = tdb_ofs_write(
+			tdb, rec_ptr + offsetof(struct tdb_record, magic),
+			&magic);
 	}
 	else {
 		ret = tdb_do_delete(tdb, rec_ptr, &rec);
@@ -439,13 +445,21 @@ _PUBLIC_ int tdb_delete(struct tdb_context *tdb, TDB_DATA key)
 /*
  * See if we have a dead record around with enough space
  */
-static tdb_off_t tdb_find_dead(struct tdb_context *tdb, uint32_t hash,
-			       struct tdb_record *r, tdb_len_t length)
+tdb_off_t tdb_find_dead(struct tdb_context *tdb, uint32_t hash,
+			struct tdb_record *r, tdb_len_t length,
+			tdb_off_t *p_last_ptr)
 {
-	tdb_off_t rec_ptr;
+	tdb_off_t rec_ptr, last_ptr;
+	tdb_off_t best_rec_ptr = 0;
+	tdb_off_t best_last_ptr = 0;
+	struct tdb_record best = { .rec_len = UINT32_MAX };
+
+	length += sizeof(tdb_off_t); /* tailer */
+
+	last_ptr = TDB_HASH_TOP(hash);
 
 	/* read in the hash top */
-	if (tdb_ofs_read(tdb, TDB_HASH_TOP(hash), &rec_ptr) == -1)
+	if (tdb_ofs_read(tdb, last_ptr, &rec_ptr) == -1)
 		return 0;
 
 	/* keep looking until we find the right record */
@@ -453,16 +467,23 @@ static tdb_off_t tdb_find_dead(struct tdb_context *tdb, uint32_t hash,
 		if (tdb_rec_read(tdb, rec_ptr, r) == -1)
 			return 0;
 
-		if (TDB_DEAD(r) && r->rec_len >= length) {
-			/*
-			 * First fit for simple coding, TODO: change to best
-			 * fit
-			 */
-			return rec_ptr;
+		if (TDB_DEAD(r) && (r->rec_len >= length) &&
+		    (r->rec_len < best.rec_len)) {
+			best_rec_ptr = rec_ptr;
+			best_last_ptr = last_ptr;
+			best = *r;
 		}
+		last_ptr = rec_ptr;
 		rec_ptr = r->next;
 	}
-	return 0;
+
+	if (best.rec_len == UINT32_MAX) {
+		return 0;
+	}
+
+	*r = best;
+	*p_last_ptr = best_last_ptr;
+	return best_rec_ptr;
 }
 
 static int _tdb_store(struct tdb_context *tdb, TDB_DATA key,
@@ -500,15 +521,16 @@ static int _tdb_store(struct tdb_context *tdb, TDB_DATA key,
 		tdb_delete_hash(tdb, key, hash);
 
 	if (tdb->max_dead_records != 0) {
+		tdb_off_t last_ptr;
 		/*
 		 * Allow for some dead records per hash chain, look if we can
 		 * find one that can hold the new record. We need enough space
 		 * for key, data and tailer. If we find one, we don't have to
 		 * consult the central freelist.
 		 */
-		rec_ptr = tdb_find_dead(
-			tdb, hash, &rec,
-			key.dsize + dbuf.dsize + sizeof(tdb_off_t));
+		rec_ptr = tdb_find_dead(tdb, hash, &rec,
+					key.dsize + dbuf.dsize,
+					&last_ptr);
 
 		if (rec_ptr != 0) {
 			rec.key_len = key.dsize;
@@ -528,26 +550,8 @@ static int _tdb_store(struct tdb_context *tdb, TDB_DATA key,
 		}
 	}
 
-	/*
-	 * We have to allocate some space from the freelist, so this means we
-	 * have to lock it. Use the chance to purge all the DEAD records from
-	 * the hash chain under the freelist lock.
-	 */
-
-	if (tdb_lock(tdb, -1, F_WRLCK) == -1) {
-		goto fail;
-	}
-
-	if ((tdb->max_dead_records != 0)
-	    && (tdb_purge_dead(tdb, hash) == -1)) {
-		tdb_unlock(tdb, -1, F_WRLCK);
-		goto fail;
-	}
-
 	/* we have to allocate some space */
-	rec_ptr = tdb_allocate(tdb, key.dsize + dbuf.dsize, &rec);
-
-	tdb_unlock(tdb, -1, F_WRLCK);
+	rec_ptr = tdb_allocate(tdb, hash, key.dsize + dbuf.dsize, &rec);
 
 	if (rec_ptr == 0) {
 		goto fail;
diff --git a/lib/tdb/common/tdb_private.h b/lib/tdb/common/tdb_private.h
index 7227b43..a672159 100644
--- a/lib/tdb/common/tdb_private.h
+++ b/lib/tdb/common/tdb_private.h
@@ -255,7 +255,8 @@ int tdb_ofs_read(struct tdb_context *tdb, tdb_off_t offset, tdb_off_t *d);
 int tdb_ofs_write(struct tdb_context *tdb, tdb_off_t offset, tdb_off_t *d);
 void *tdb_convert(void *buf, uint32_t size);
 int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec);
-tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct tdb_record *rec);
+tdb_off_t tdb_allocate(struct tdb_context *tdb, int hash, tdb_len_t length,
+		       struct tdb_record *rec);
 int tdb_ofs_read(struct tdb_context *tdb, tdb_off_t offset, tdb_off_t *d);
 int tdb_ofs_write(struct tdb_context *tdb, tdb_off_t offset, tdb_off_t *d);
 int tdb_lock_record(struct tdb_context *tdb, tdb_off_t off);
@@ -272,6 +273,10 @@ int tdb_parse_data(struct tdb_context *tdb, TDB_DATA key,
 		   void *private_data);
 tdb_off_t tdb_find_lock_hash(struct tdb_context *tdb, TDB_DATA key, uint32_t hash, int locktype,
 			   struct tdb_record *rec);
+tdb_off_t tdb_find_dead(struct tdb_context *tdb, uint32_t hash,
+			struct tdb_record *r, tdb_len_t length,
+			tdb_off_t *p_last_ptr);
+int tdb_purge_dead(struct tdb_context *tdb, uint32_t hash);
 void tdb_io_init(struct tdb_context *tdb);
 int tdb_expand(struct tdb_context *tdb, tdb_off_t size);
 tdb_off_t tdb_expand_adjust(tdb_off_t map_size, tdb_off_t size, int page_size);
diff --git a/lib/tdb/wscript b/lib/tdb/wscript
index 00a1c34..fc396d6 100644
--- a/lib/tdb/wscript
+++ b/lib/tdb/wscript
@@ -1,7 +1,7 @@
 #!/usr/bin/env python
 
 APPNAME = 'tdb'
-VERSION = '1.2.12'
+VERSION = '1.2.13'
 
 blddir = 'bin'
 


-- 
Samba Shared Repository


More information about the samba-cvs mailing list