[PATCHES] freelist defragmentation and (dependent) ctdb repacking

Michael Adam obnox at samba.org
Wed Jun 11 09:43:05 MDT 2014


Hi List,

in the past months, I have been thinking about the tdb freelist
and vacuuming performance from time to time. I have now managed
to polish my work on that a bit, so that I can present it.

One problem is the potential fragmentation of tdb's free list which
stems from the fact that we can't merge a record that is to be
inserted into the freelist with its right neighbor (if that is a
freelist record) but only with its left neighbor. This is due
to the singly-linked nature of our freelist. We can only merge
with the left neigbour.

Now, when we traverse the freelist, we can look left of each
visited record and possibly merge with that if it is also a
free record, hence removing the fragmentation.

I have implemented this in the first patchset:

- I have created a function "tdb_freelist_merge_adjacent()"
  that does exactly this. So after this function has run, there
  should be no adjacent freelist records any more.

- I have also created a tdbtool subcommand "merge_freelist" to
  call that function.

- Because we might not want to bump the version to 1.3.1 for
  this (not sure..), or as a proposal, I have also changed
  tdb_freelist_size() to call tdb_freelist_merge_adjacent()
  so that it automatically defragments.

- Secondly, there is a place when we traverse the freelist
  anyways, namely in tdb_allocate_from_freelist(). I have
  changed our loop there to merge with left freelist records,
  thereby automatically reducing the freelist fragmentation
  as the database is used. This will usually not traverse until
  the end though since the bes fit algorithm works with decreasing
  accuracy.

- I probably owe a test to measure the effect?!

The second patchset is for ctdb and builds upon the first one.

- It changes the vacuum code to always call tdb_freelist_size()
  again before checking whether a repack run is needed, so that
  it automatically defragments the freelist.
  This might reduce the frequency of (blocking) repacks.

- As a variant, on top there is a patch to explicitly call
  tdb_freelist_merge_adjacent() instead of freelist_size().
  If we choose to have freelist_size defragment, we don't
  need this change, and it is also more backward compatible.

- Additionally, I made the repack code use the vanialla
  tdb_repack() function and removed the ctdb_repack_tdb one.

- Finally, some code cleanup is included.

Review / comment / push appreciated.

The full code can also be seen in my "master-tdb-freelist" branch
in git://git.samba.org/obnox/samba/samba-obnox.git,
https://git.samba.org/?p=obnox/samba/samba-obnox.git;a=shortlog;h=refs/heads/master-tdb-freelist

(please ignore the few top WIP-commits where I experiment with
fixes for some build problems)

Cheers - Michael

-------------- next part --------------
From a1e8ddb468b6479f30d8f5375a0de7ac70c07bc5 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 01:31:50 +0100
Subject: [PATCH 01/13] tdb: factor read_record_on_left() out of tdb_free()

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 79 +++++++++++++++++++++++++++++------------------
 1 file changed, 49 insertions(+), 30 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 2aeeb1c..e6f7db5 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -97,10 +97,58 @@ static int update_tailer(struct tdb_context *tdb, tdb_off_t offset,
 			 &totalsize);
 }
 
+/**
+ * Read the record directly on the left.
+ * Fail if there is no record on the left.
+ */
+static int read_record_on_left(struct tdb_context *tdb, tdb_off_t offset,
+			       tdb_off_t *left_p,
+			       struct tdb_record *left_r)
+{
+	tdb_off_t left = offset - sizeof(tdb_off_t);
+	struct tdb_record l;
+	tdb_off_t leftsize;
+
+	if (offset - sizeof(tdb_off_t) > TDB_DATA_START(tdb->hash_size)) {
+
+		/* Read in tailer and jump back to header */
+		if (tdb_ofs_read(tdb, left, &leftsize) == -1) {
+			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: left offset read failed at %u\n", left));
+			return -1;
+		}
+
+		/* it could be uninitialised data */
+		if (leftsize == 0 || leftsize == TDB_PAD_U32) {
+			return -1;
+		}
+
+		left = offset - leftsize;
+
+		if (leftsize > offset ||
+		    left < TDB_DATA_START(tdb->hash_size)) {
+			return -1;
+		}
+
+		/* Now read in the left record */
+		if (tdb->methods->tdb_read(tdb, left, &l, sizeof(l), DOCONV()) == -1) {
+			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: left read failed at %u (%u)\n", left, leftsize));
+			return -1;
+		}
+
+		*left_p = left;
+		*left_r = l;
+	}
+
+	return -1;
+}
+
 /* Add an element into the freelist. Merge adjacent records if
    necessary. */
 int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
 {
+	tdb_off_t left;
+	struct tdb_record l;
+
 	/* Allocation and tailer lock */
 	if (tdb_lock(tdb, -1, F_WRLCK) != 0)
 		return -1;
@@ -138,36 +186,7 @@ int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
 left:
 #endif
 
-	/* Look left */
-	if (offset - sizeof(tdb_off_t) > TDB_DATA_START(tdb->hash_size)) {
-		tdb_off_t left = offset - sizeof(tdb_off_t);
-		struct tdb_record l;
-		tdb_off_t leftsize;
-
-		/* Read in tailer and jump back to header */
-		if (tdb_ofs_read(tdb, left, &leftsize) == -1) {
-			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: left offset read failed at %u\n", left));
-			goto update;
-		}
-
-		/* it could be uninitialised data */
-		if (leftsize == 0 || leftsize == TDB_PAD_U32) {
-			goto update;
-		}
-
-		left = offset - leftsize;
-
-		if (leftsize > offset ||
-		    left < TDB_DATA_START(tdb->hash_size)) {
-			goto update;
-		}
-
-		/* Now read in the left record */
-		if (tdb->methods->tdb_read(tdb, left, &l, sizeof(l), DOCONV()) == -1) {
-			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: left read failed at %u (%u)\n", left, leftsize));
-			goto update;
-		}
-
+	if (read_record_on_left(tdb, offset, &left, &l) == 0) {
 		/* If it's free, expand to include it. */
 		if (l.magic == TDB_FREE_MAGIC) {
 			/* we now merge the new record into the left record, rather than the other
-- 
1.9.1


From f073b96623e928a0665db3318bf44c9a03a22621 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 01:39:15 +0100
Subject: [PATCH 02/13] tdb: increase readability of read_record_on_left()

by using early returns and better variable names,
and reducing indentation.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 68 ++++++++++++++++++++++++++++-------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index e6f7db5..4a48c3d 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -101,45 +101,59 @@ static int update_tailer(struct tdb_context *tdb, tdb_off_t offset,
  * Read the record directly on the left.
  * Fail if there is no record on the left.
  */
-static int read_record_on_left(struct tdb_context *tdb, tdb_off_t offset,
+static int read_record_on_left(struct tdb_context *tdb, tdb_off_t rec_ptr,
 			       tdb_off_t *left_p,
 			       struct tdb_record *left_r)
 {
-	tdb_off_t left = offset - sizeof(tdb_off_t);
-	struct tdb_record l;
-	tdb_off_t leftsize;
+	tdb_off_t left_ptr;
+	tdb_off_t left_size;
+	struct tdb_record left_rec;
+	int ret;
 
-	if (offset - sizeof(tdb_off_t) > TDB_DATA_START(tdb->hash_size)) {
+	left_ptr = rec_ptr - sizeof(tdb_off_t);
 
-		/* Read in tailer and jump back to header */
-		if (tdb_ofs_read(tdb, left, &leftsize) == -1) {
-			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: left offset read failed at %u\n", left));
-			return -1;
-		}
+	if (left_ptr <= TDB_DATA_START(tdb->hash_size)) {
+		/* no record on the left */
+		return -1;
+	}
 
-		/* it could be uninitialised data */
-		if (leftsize == 0 || leftsize == TDB_PAD_U32) {
-			return -1;
-		}
+	/* Read in tailer and jump back to header */
+	ret = tdb_ofs_read(tdb, left_ptr, &left_size);
+	if (ret == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL,
+			"tdb_free: left offset read failed at %u\n", left_ptr));
+		return -1;
+	}
 
-		left = offset - leftsize;
+	/* it could be uninitialised data */
+	if (left_size == 0 || left_size == TDB_PAD_U32) {
+		return -1;
+	}
 
-		if (leftsize > offset ||
-		    left < TDB_DATA_START(tdb->hash_size)) {
-			return -1;
-		}
+	if (left_size > rec_ptr) {
+		return -1;
+	}
 
-		/* Now read in the left record */
-		if (tdb->methods->tdb_read(tdb, left, &l, sizeof(l), DOCONV()) == -1) {
-			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: left read failed at %u (%u)\n", left, leftsize));
-			return -1;
-		}
+	left_ptr = rec_ptr - left_size;
+
+	if (left_ptr < TDB_DATA_START(tdb->hash_size)) {
+		return -1;
+	}
 
-		*left_p = left;
-		*left_r = l;
+	/* Now read in the left record */
+	ret = tdb->methods->tdb_read(tdb, left_ptr, &left_rec,
+				     sizeof(left_rec), DOCONV());
+	if (ret == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL,
+			 "tdb_free: left read failed at %u (%u)\n",
+			 left_ptr, left_size));
+		return -1;
 	}
 
-	return -1;
+	*left_p = left_ptr;
+	*left_r = left_rec;
+
+	return 0;
 }
 
 /* Add an element into the freelist. Merge adjacent records if
-- 
1.9.1


From 84d11407ecf280bd9d9ee38cc86e841d33a4eb30 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 01:46:42 +0100
Subject: [PATCH 03/13] tdb: reduce indentation in tdb_free() for merging left

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 4a48c3d..acc9f13 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -200,26 +200,31 @@ int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
 left:
 #endif
 
-	if (read_record_on_left(tdb, offset, &left, &l) == 0) {
-		/* If it's free, expand to include it. */
-		if (l.magic == TDB_FREE_MAGIC) {
-			/* we now merge the new record into the left record, rather than the other
-			   way around. This makes the operation O(1) instead of O(n). This change
-			   prevents traverse from being O(n^2) after a lot of deletes */
-			l.rec_len += sizeof(*rec) + rec->rec_len;
-			if (tdb_rec_write(tdb, left, &l) == -1) {
-				TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_left failed at %u\n", left));
-				goto fail;
-			}
-			if (update_tailer(tdb, left, &l) == -1) {
-				TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_tailer failed at %u\n", offset));
-				goto fail;
-			}
-			tdb_unlock(tdb, -1, F_WRLCK);
-			return 0;
-		}
+	if (read_record_on_left(tdb, offset, &left, &l) != 0) {
+		goto update;
 	}
 
+	if (l.magic != TDB_FREE_MAGIC) {
+		goto update;
+	}
+
+	/* It's free - expand to include it. */
+
+	/* we now merge the new record into the left record, rather than the other
+	   way around. This makes the operation O(1) instead of O(n). This change
+	   prevents traverse from being O(n^2) after a lot of deletes */
+	l.rec_len += sizeof(*rec) + rec->rec_len;
+	if (tdb_rec_write(tdb, left, &l) == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_left failed at %u\n", left));
+		goto fail;
+	}
+	if (update_tailer(tdb, left, &l) == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_tailer failed at %u\n", offset));
+		goto fail;
+	}
+	tdb_unlock(tdb, -1, F_WRLCK);
+	return 0;
+
 update:
 
 	/* Now, prepend to free list */
-- 
1.9.1


From eee236a4f77e997e44e44036b73ed1aaf3bda796 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 01:51:39 +0100
Subject: [PATCH 04/13] tdb: fix debug message in tdb_free()

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index acc9f13..ffd4bcd 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -219,7 +219,7 @@ left:
 		goto fail;
 	}
 	if (update_tailer(tdb, left, &l) == -1) {
-		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_tailer failed at %u\n", offset));
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_tailer failed at %u\n", left));
 		goto fail;
 	}
 	tdb_unlock(tdb, -1, F_WRLCK);
-- 
1.9.1


From 738002c7182b0dccc80e27d8a98c5a1362016464 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 02:01:38 +0100
Subject: [PATCH 05/13] tdb: factor merge_with_left_record() out of tdb_free()

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 44 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index ffd4bcd..1bea88e 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -156,6 +156,40 @@ static int read_record_on_left(struct tdb_context *tdb, tdb_off_t rec_ptr,
 	return 0;
 }
 
+/**
+ * Merge new freelist record with the direct left neighbour.
+ * This assumes that left_rec represents the record
+ * directly to the left of right_rec and that this is
+ * a freelist record.
+ */
+static int merge_with_left_record(struct tdb_context *tdb,
+				  tdb_off_t left_ptr,
+				  struct tdb_record *left_rec,
+				  struct tdb_record *right_rec)
+{
+	int ret;
+
+	left_rec->rec_len += sizeof(*right_rec) + right_rec->rec_len;
+
+	ret = tdb_rec_write(tdb, left_ptr, left_rec);
+	if (ret == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL,
+			 "merge_with_left_record: update_left failed at %u\n",
+			 left_ptr));
+		return -1;
+	}
+
+	ret = update_tailer(tdb, left_ptr, left_rec);
+	if (ret == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL,
+			 "merge_with_left_record: update_tailer failed at %u\n",
+			 left_ptr));
+		return -1;
+	}
+
+	return 0;
+}
+
 /* Add an element into the freelist. Merge adjacent records if
    necessary. */
 int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
@@ -213,15 +247,11 @@ left:
 	/* we now merge the new record into the left record, rather than the other
 	   way around. This makes the operation O(1) instead of O(n). This change
 	   prevents traverse from being O(n^2) after a lot of deletes */
-	l.rec_len += sizeof(*rec) + rec->rec_len;
-	if (tdb_rec_write(tdb, left, &l) == -1) {
-		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_left failed at %u\n", left));
-		goto fail;
-	}
-	if (update_tailer(tdb, left, &l) == -1) {
-		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_tailer failed at %u\n", left));
+
+	if (merge_with_left_record(tdb, left, &l, rec) != 0) {
 		goto fail;
 	}
+
 	tdb_unlock(tdb, -1, F_WRLCK);
 	return 0;
 
-- 
1.9.1


From 0efd2ce23054fd393cb425975bf8303e1933d182 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 01:49:44 +0200
Subject: [PATCH 06/13] tdb: improve comments for tdb_free().

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 1bea88e..41986b9 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -190,8 +190,17 @@ static int merge_with_left_record(struct tdb_context *tdb,
 	return 0;
 }
 
-/* Add an element into the freelist. Merge adjacent records if
-   necessary. */
+/**
+ * Add an element into the freelist.
+ *
+ * We merge the new record into the left record if it is also a
+ * free record, but not with the right one. This makes the
+ * operation O(1) instead of O(n): merging with the right record
+ * requires a traverse of the freelist to find the previous
+ * record in the free list.
+ *
+ * This prevents db traverses from being O(n^2) after a lot of deletes.
+ */
 int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
 {
 	tdb_off_t left;
@@ -243,11 +252,6 @@ left:
 	}
 
 	/* It's free - expand to include it. */
-
-	/* we now merge the new record into the left record, rather than the other
-	   way around. This makes the operation O(1) instead of O(n). This change
-	   prevents traverse from being O(n^2) after a lot of deletes */
-
 	if (merge_with_left_record(tdb, left, &l, rec) != 0) {
 		goto fail;
 	}
-- 
1.9.1


From 3c1be892bbcb7861d611aac2645bb32e2a775c8b Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Wed, 11 Jun 2014 12:00:48 +0200
Subject: [PATCH 07/13] tdb: add utility function
 check_merge_with_left_record()

Check whether the record left of a given freelist record is
also a freelist record, and if so, merge the two records.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 41986b9..69b3c66 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -191,6 +191,56 @@ static int merge_with_left_record(struct tdb_context *tdb,
 }
 
 /**
+ * Check whether the record left of a given freelist record is
+ * also a freelist record, and if so, merge the two records.
+ *
+ * Return code:
+ *  -1 upon error
+ *   0 if left was not a free record
+ *   1 if left was free and successfully merged.
+ *
+ * The currend record is handed in with pointer and fully read record.
+ *
+ * The left record pointer and struct can be retrieved as result
+ * in lp and lr;
+ */
+static int check_merge_with_left_record(struct tdb_context *tdb,
+					tdb_off_t rec_ptr,
+					struct tdb_record *rec,
+					tdb_off_t *lp,
+					struct tdb_record *lr)
+{
+	tdb_off_t left_ptr;
+	struct tdb_record left_rec;
+	int ret;
+
+	ret = read_record_on_left(tdb, rec_ptr, &left_ptr, &left_rec);
+	if (ret != 0) {
+		return 0;
+	}
+
+	if (left_rec.magic != TDB_FREE_MAGIC) {
+		return 0;
+	}
+
+	/* It's free - expand to include it. */
+	ret = merge_with_left_record(tdb, left_ptr, &left_rec, rec);
+	if (ret != 0) {
+		return -1;
+	}
+
+	if (lp != NULL) {
+		*lp = left_ptr;
+	}
+
+	if (lr != NULL) {
+		*lr = left_rec;
+	}
+
+	return 1;
+}
+
+/**
  * Add an element into the freelist.
  *
  * We merge the new record into the left record if it is also a
-- 
1.9.1


From 7c16e2e7036696f05b142260cc1864cb1f6f4583 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Wed, 11 Jun 2014 12:03:01 +0200
Subject: [PATCH 08/13] tdb: simplify tdb_free() using
 check_merge_with_left_record()

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 69b3c66..47d6301 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -253,8 +253,7 @@ static int check_merge_with_left_record(struct tdb_context *tdb,
  */
 int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
 {
-	tdb_off_t left;
-	struct tdb_record l;
+	int ret;
 
 	/* Allocation and tailer lock */
 	if (tdb_lock(tdb, -1, F_WRLCK) != 0)
@@ -293,25 +292,17 @@ int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
 left:
 #endif
 
-	if (read_record_on_left(tdb, offset, &left, &l) != 0) {
-		goto update;
-	}
-
-	if (l.magic != TDB_FREE_MAGIC) {
-		goto update;
-	}
-
-	/* It's free - expand to include it. */
-	if (merge_with_left_record(tdb, left, &l, rec) != 0) {
+	ret = check_merge_with_left_record(tdb, offset, rec, NULL, NULL);
+	if (ret == -1) {
 		goto fail;
 	}
+	if (ret == 1) {
+		/* merged */
+		goto done;
+	}
 
-	tdb_unlock(tdb, -1, F_WRLCK);
-	return 0;
-
-update:
+	/* Nothing to merge, prepend to free list */
 
-	/* Now, prepend to free list */
 	rec->magic = TDB_FREE_MAGIC;
 
 	if (tdb_ofs_read(tdb, FREELIST_TOP, &rec->next) == -1 ||
@@ -321,6 +312,7 @@ update:
 		goto fail;
 	}
 
+done:
 	/* And we're done. */
 	tdb_unlock(tdb, -1, F_WRLCK);
 	return 0;
-- 
1.9.1


From 03113af204488db55c1e83108ea7535f4d0f6d3a Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Wed, 11 Jun 2014 12:04:01 +0200
Subject: [PATCH 09/13] tdb: add utility function
 check_merge_ptr_with_left_record()

Variant of check_merge_with_left_record() that reads the record
itself if necessary.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 47d6301..9b50875 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -241,6 +241,59 @@ static int check_merge_with_left_record(struct tdb_context *tdb,
 }
 
 /**
+ * Check whether the record left of a given freelist record is
+ * also a freelist record, and if so, merge the two records.
+ *
+ * Return code:
+ *  -1 upon error
+ *   0 if left was not a free record
+ *   1 if left was free and successfully merged.
+ *
+ * In this variant, the input record is specified just as the pointer
+ * and is read from the database if needed.
+ *
+ * next_ptr will contain the original record's next pointer after
+ * successful merging (which will be lost after merging), so that
+ * the caller can update the last pointer.
+ */
+static int check_merge_ptr_with_left_record(struct tdb_context *tdb,
+					    tdb_off_t rec_ptr,
+					    tdb_off_t *next_ptr)
+{
+	tdb_off_t left_ptr;
+	struct tdb_record rec, left_rec;
+	int ret;
+
+	ret = read_record_on_left(tdb, rec_ptr, &left_ptr, &left_rec);
+	if (ret != 0) {
+		return 0;
+	}
+
+	if (left_rec.magic != TDB_FREE_MAGIC) {
+		return 0;
+	}
+
+	/* It's free - expand to include it. */
+
+	ret = tdb->methods->tdb_read(tdb, rec_ptr, &rec,
+				     sizeof(rec), DOCONV());
+	if (ret != 0) {
+		return -1;
+	}
+
+	ret = merge_with_left_record(tdb, left_ptr, &left_rec, &rec);
+	if (ret != 0) {
+		return -1;
+	}
+
+	if (next_ptr != NULL) {
+		*next_ptr = rec.next;
+	}
+
+	return 1;
+}
+
+/**
  * Add an element into the freelist.
  *
  * We merge the new record into the left record if it is also a
-- 
1.9.1


From e1386fe7d3c19df00e0d4383be02d639f146fe45 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 02:39:09 +0100
Subject: [PATCH 10/13] tdb: add tdb_freelist_merge_adjacent()

This is intended to be called to reduce the fragmentation in the
freelist. This is to make up the deficiency of the freelist
to be not doubly linked. If the freelist were doubly linked,
we could easily avoid the creation of adjacent freelist entries.
But with the current singly linked list, it is only possible
to cheaply merge a new free record into a freelist entry on the left,
not on the right...

This can be called periodically, e.g. in the vacuuming process
of a ctdb cluster.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/tdb/include/tdb.h     |  4 ++++
 2 files changed, 63 insertions(+)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 9b50875..d8cdfa0 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -631,3 +631,62 @@ _PUBLIC_ int tdb_freelist_size(struct tdb_context *tdb)
 	tdb_unlock(tdb, -1, F_RDLCK);
 	return count;
 }
+
+/**
+ * Merge adjacent records in the freelist.
+ */
+_PUBLIC_ int tdb_freelist_merge_adjacent(struct tdb_context *tdb,
+					 int *count_records, int *count_merged)
+{
+	tdb_off_t cur, next;
+	int count = 0;
+	int merged = 0;
+	int ret;
+
+	ret = tdb_lock(tdb, -1, F_RDLCK);
+	if (ret == -1) {
+		return -1;
+	}
+
+	cur = FREELIST_TOP;
+	while (tdb_ofs_read(tdb, cur, &next) == 0 && next != 0) {
+		tdb_off_t next2;
+
+		count++;
+
+		ret = check_merge_ptr_with_left_record(tdb, next, &next2);
+		if (ret == -1) {
+			goto done;
+		}
+		if (ret == 1) {
+			/*
+			 * merged:
+			 * now let cur->next point to next2 instead of next
+			 */
+
+			ret = tdb_ofs_write(tdb, cur, &next2);
+			if (ret != 0) {
+				goto done;
+			}
+
+			next = next2;
+			merged++;
+		}
+
+		cur = next;
+	}
+
+	if (count_records != NULL) {
+		*count_records = count;
+	}
+
+	if (count_merged != NULL) {
+		*count_merged = merged;
+	}
+
+	ret = 0;
+
+done:
+	tdb_unlock(tdb, -1, F_RDLCK);
+	return ret;
+}
diff --git a/lib/tdb/include/tdb.h b/lib/tdb/include/tdb.h
index 5ea5e60..3593300 100644
--- a/lib/tdb/include/tdb.h
+++ b/lib/tdb/include/tdb.h
@@ -900,6 +900,10 @@ int tdb_validate_freelist(struct tdb_context *tdb, int *pnum_entries);
 int tdb_freelist_size(struct tdb_context *tdb);
 char *tdb_summary(struct tdb_context *tdb);
 
+
+int tdb_freelist_merge_adjacent(struct tdb_context *tdb,
+				int *count_records, int *count_merged);
+
 extern TDB_DATA tdb_null;
 
 #ifdef  __cplusplus
-- 
1.9.1


From 857abdd9fe2c469a2e962e991663f32addff0aba Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 07:39:31 +0100
Subject: [PATCH 11/13] tdb: add "merge_freelist" command to tdbtool

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/tools/tdbtool.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/lib/tdb/tools/tdbtool.c b/lib/tdb/tools/tdbtool.c
index 2f93e33..f33c293 100644
--- a/lib/tdb/tools/tdbtool.c
+++ b/lib/tdb/tools/tdbtool.c
@@ -55,6 +55,7 @@ enum commands {
 	CMD_DELETE,
 	CMD_LIST_HASH_FREE,
 	CMD_LIST_FREE,
+	CMD_FREELIST_MERGE,
 	CMD_INFO,
 	CMD_MMAP,
 	CMD_SPEED,
@@ -89,6 +90,7 @@ COMMAND_TABLE cmd_table[] = {
 	{"delete",	CMD_DELETE},
 	{"list",	CMD_LIST_HASH_FREE},
 	{"free",	CMD_LIST_FREE},
+	{"merge_freelist",	CMD_FREELIST_MERGE},
 	{"info",	CMD_INFO},
 	{"speed",	CMD_SPEED},
 	{"mmap",	CMD_MMAP},
@@ -232,6 +234,7 @@ static void help(void)
 "  delete    key        : delete a record by key\n"
 "  list                 : print the database hash table and freelist\n"
 "  free                 : print the database freelist\n"
+"  merge_freelist       : merge adjacent records in the freelist\n"
 "  check                : check the integrity of an opened database\n"
 "  repack               : repack the database\n"
 "  speed                : perform speed tests on the database\n"
@@ -710,6 +713,18 @@ static int do_command(void)
 		case CMD_LIST_FREE:
 			tdb_printfreelist(tdb);
 			return 0;
+		case CMD_FREELIST_MERGE: {
+			int entries, merged, ret;
+
+			ret = tdb_freelist_merge_adjacent(tdb, &entries,
+							  &merged);
+			if (ret == 0) {
+				printf("merged %d freelist entries. "
+				       "(now %d entries)\n", merged, entries);
+			}
+
+			return 0;
+		}
 		case CMD_INFO:
 			info_tdb();
 			return 0;
-- 
1.9.1


From 84fbb9489a24e04bbc642c80b5972eb6807020cc Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Wed, 11 Jun 2014 12:05:57 +0200
Subject: [PATCH 12/13] tdb: defragment the freelist in
 tdb_allocate_from_freelist()

While we are traversing the freelist anyways, merge a record
with the left if it is also a free list record.

That partially makes up for the fragmentation introduced by
the lack of merging with right records in tdb_free().

Note there is a potential slight downside:
If the left record we merge the current record into was earlier
in the chain and has hence already been met in traverse,
then we can not use the enlarged record even if it might be
a new best fit.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index d8cdfa0..dfdb54e 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -474,10 +474,46 @@ static tdb_off_t tdb_allocate_from_freelist(
 	   issues when faced with a slowly increasing record size.
 	 */
 	while (rec_ptr) {
+		int ret;
+		tdb_off_t left_ptr;
+		struct tdb_record left_rec;
+
 		if (tdb_rec_free_read(tdb, rec_ptr, rec) == -1) {
 			return 0;
 		}
 
+		ret = check_merge_with_left_record(tdb, rec_ptr, rec,
+						   &left_ptr, &left_rec);
+		if (ret == -1) {
+			return 0;
+		}
+		if (ret == 1) {
+			/* merged */
+			rec_ptr = rec->next;
+			ret = tdb_ofs_write(tdb, last_ptr, &rec->next);
+			if (ret == -1) {
+				return 0;
+			}
+
+			/*
+			 * We need to check left record, because we may have
+			 * visited it earlier with smaller size.
+			 * We can only update the best fit with the left record
+			 * when it was already the best fit (in which case we
+			 * will increase the current best fit): When the
+			 * augmented left record would be a new best fit, we
+			 * can not use it because we have not got the record
+			 * preceeding the left record in the chain. If we are
+			 * lucky, we will meet the left record later in the
+			 * chain.
+			 */
+			if (bestfit.rec_ptr == left_ptr) {
+				bestfit.rec_len = left_rec.rec_len;
+			}
+
+			continue;
+		}
+
 		if (rec->rec_len >= length) {
 			if (bestfit.rec_ptr == 0 ||
 			    rec->rec_len < bestfit.rec_len) {
-- 
1.9.1


From a741d8ab302847dce06e6dad3c760b0189a6664c Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Wed, 11 Jun 2014 13:23:33 +0200
Subject: [PATCH 13/13] tdb: change version to 1.3.1

* internal code cleanup
* add function tdb_freelist_merge_adjacent() to defragment the free list

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/ABI/tdb-1.3.1.sigs | 69 ++++++++++++++++++++++++++++++++++++++++++++++
 lib/tdb/wscript            |  2 +-
 2 files changed, 70 insertions(+), 1 deletion(-)
 create mode 100644 lib/tdb/ABI/tdb-1.3.1.sigs

diff --git a/lib/tdb/ABI/tdb-1.3.1.sigs b/lib/tdb/ABI/tdb-1.3.1.sigs
new file mode 100644
index 0000000..4bb903f
--- /dev/null
+++ b/lib/tdb/ABI/tdb-1.3.1.sigs
@@ -0,0 +1,69 @@
+tdb_add_flags: void (struct tdb_context *, unsigned int)
+tdb_append: int (struct tdb_context *, TDB_DATA, TDB_DATA)
+tdb_chainlock: int (struct tdb_context *, TDB_DATA)
+tdb_chainlock_mark: int (struct tdb_context *, TDB_DATA)
+tdb_chainlock_nonblock: int (struct tdb_context *, TDB_DATA)
+tdb_chainlock_read: int (struct tdb_context *, TDB_DATA)
+tdb_chainlock_unmark: int (struct tdb_context *, TDB_DATA)
+tdb_chainunlock: int (struct tdb_context *, TDB_DATA)
+tdb_chainunlock_read: int (struct tdb_context *, TDB_DATA)
+tdb_check: int (struct tdb_context *, int (*)(TDB_DATA, TDB_DATA, void *), void *)
+tdb_close: int (struct tdb_context *)
+tdb_delete: int (struct tdb_context *, TDB_DATA)
+tdb_dump_all: void (struct tdb_context *)
+tdb_enable_seqnum: void (struct tdb_context *)
+tdb_error: enum TDB_ERROR (struct tdb_context *)
+tdb_errorstr: const char *(struct tdb_context *)
+tdb_exists: int (struct tdb_context *, TDB_DATA)
+tdb_fd: int (struct tdb_context *)
+tdb_fetch: TDB_DATA (struct tdb_context *, TDB_DATA)
+tdb_firstkey: TDB_DATA (struct tdb_context *)
+tdb_freelist_merge_adjacent: int (struct tdb_context *, int *, int *)
+tdb_freelist_size: int (struct tdb_context *)
+tdb_get_flags: int (struct tdb_context *)
+tdb_get_logging_private: void *(struct tdb_context *)
+tdb_get_seqnum: int (struct tdb_context *)
+tdb_hash_size: int (struct tdb_context *)
+tdb_increment_seqnum_nonblock: void (struct tdb_context *)
+tdb_jenkins_hash: unsigned int (TDB_DATA *)
+tdb_lock_nonblock: int (struct tdb_context *, int, int)
+tdb_lockall: int (struct tdb_context *)
+tdb_lockall_mark: int (struct tdb_context *)
+tdb_lockall_nonblock: int (struct tdb_context *)
+tdb_lockall_read: int (struct tdb_context *)
+tdb_lockall_read_nonblock: int (struct tdb_context *)
+tdb_lockall_unmark: int (struct tdb_context *)
+tdb_log_fn: tdb_log_func (struct tdb_context *)
+tdb_map_size: size_t (struct tdb_context *)
+tdb_name: const char *(struct tdb_context *)
+tdb_nextkey: TDB_DATA (struct tdb_context *, TDB_DATA)
+tdb_null: dptr = 0xXXXX, dsize = 0
+tdb_open: struct tdb_context *(const char *, int, int, int, mode_t)
+tdb_open_ex: struct tdb_context *(const char *, int, int, int, mode_t, const struct tdb_logging_context *, tdb_hash_func)
+tdb_parse_record: int (struct tdb_context *, TDB_DATA, int (*)(TDB_DATA, TDB_DATA, void *), void *)
+tdb_printfreelist: int (struct tdb_context *)
+tdb_remove_flags: void (struct tdb_context *, unsigned int)
+tdb_reopen: int (struct tdb_context *)
+tdb_reopen_all: int (int)
+tdb_repack: int (struct tdb_context *)
+tdb_rescue: int (struct tdb_context *, void (*)(TDB_DATA, TDB_DATA, void *), void *)
+tdb_runtime_check_for_robust_mutexes: bool (void)
+tdb_set_logging_function: void (struct tdb_context *, const struct tdb_logging_context *)
+tdb_set_max_dead: void (struct tdb_context *, int)
+tdb_setalarm_sigptr: void (struct tdb_context *, volatile sig_atomic_t *)
+tdb_store: int (struct tdb_context *, TDB_DATA, TDB_DATA, int)
+tdb_summary: char *(struct tdb_context *)
+tdb_transaction_cancel: int (struct tdb_context *)
+tdb_transaction_commit: int (struct tdb_context *)
+tdb_transaction_prepare_commit: int (struct tdb_context *)
+tdb_transaction_start: int (struct tdb_context *)
+tdb_transaction_start_nonblock: int (struct tdb_context *)
+tdb_transaction_write_lock_mark: int (struct tdb_context *)
+tdb_transaction_write_lock_unmark: int (struct tdb_context *)
+tdb_traverse: int (struct tdb_context *, tdb_traverse_func, void *)
+tdb_traverse_read: int (struct tdb_context *, tdb_traverse_func, void *)
+tdb_unlock: int (struct tdb_context *, int, int)
+tdb_unlockall: int (struct tdb_context *)
+tdb_unlockall_read: int (struct tdb_context *)
+tdb_validate_freelist: int (struct tdb_context *, int *)
+tdb_wipe_all: int (struct tdb_context *)
diff --git a/lib/tdb/wscript b/lib/tdb/wscript
index 885548d..ab3ff6d 100644
--- a/lib/tdb/wscript
+++ b/lib/tdb/wscript
@@ -1,7 +1,7 @@
 #!/usr/bin/env python
 
 APPNAME = 'tdb'
-VERSION = '1.3.0'
+VERSION = '1.3.1'
 
 blddir = 'bin'
 
-- 
1.9.1

-------------- next part --------------
From 51d069db954d64aa36c91c26a68d0847a8389df3 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 02:51:39 +0200
Subject: [PATCH 01/13] ctdb:vacuum: adapt debug message for repacking.

Now we usually have records to delete == 0 after the preceding
vacuum run. Anyways, deletion is not a major aspect any more
of the repack run and will vanish soon.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 574ad87..fdca2a3 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1480,8 +1480,8 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 		return 0;
 	}
 
-	DEBUG(DEBUG_INFO,("Repacking %s with %u freelist entries and %u records to delete\n", 
-			name, freelist_size, vdata->count.delete_list.left));
+	DEBUG(DEBUG_INFO, ("Repacking %s with %u freelist entries\n",
+			   name, freelist_size));
 
 	/*
 	 * repack and implicitely get rid of the records we can delete
-- 
1.9.1


From 7de186341a504368530492c562b272f1f950beef Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 02:47:21 +0200
Subject: [PATCH 02/13] ctdb:vacuum: remove a comment in
 ctdb_vacuum_and_repack_db()

The repack operation now mainly defragments the freelist
and does not usually delete any records any more.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index fdca2a3..18ea53d 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1483,9 +1483,6 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	DEBUG(DEBUG_INFO, ("Repacking %s with %u freelist entries\n",
 			   name, freelist_size));
 
-	/*
-	 * repack and implicitely get rid of the records we can delete
-	 */
 	if (ctdb_repack_tdb(ctdb_db->ltdb->tdb, mem_ctx, vdata) != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to repack '%s'\n", name));
 		talloc_free(vdata);
-- 
1.9.1


From ee04114bb0b0de857696a696ecfeb79e46d61ebd Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 02:53:29 +0200
Subject: [PATCH 03/13] ctdb:vacuum: use plain tdb_repack() instead of
 ctdb_repack_tdb()

Since we usually have 0 records left for repack-deletion,
repacking is essentially used for the purpose of defragmenting
the freelist, we can use the vanilla tdb_repack function.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 18ea53d..ee7bb0b 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1483,7 +1483,7 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	DEBUG(DEBUG_INFO, ("Repacking %s with %u freelist entries\n",
 			   name, freelist_size));
 
-	if (ctdb_repack_tdb(ctdb_db->ltdb->tdb, mem_ctx, vdata) != 0) {
+	if (tdb_repack(ctdb_db->ltdb->tdb) != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to repack '%s'\n", name));
 		talloc_free(vdata);
 		return -1;
-- 
1.9.1


From f1edb58773f7a78f58e8dfe5258bc95b809d04fa Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 02:57:00 +0200
Subject: [PATCH 04/13] ctdb:vacuum: remove now unused ctdb_repack_tdb().

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 129 ----------------------------------------------
 1 file changed, 129 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index ee7bb0b..2c78de7 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1295,135 +1295,6 @@ static int ctdb_vacuum_db(struct ctdb_db_context *ctdb_db,
 	return 0;
 }
 
-
-/*
- * traverse function for repacking
- */
-static int repack_traverse(struct tdb_context *tdb, TDB_DATA key, TDB_DATA data,
-			   void *private_data)
-{
-	struct vacuum_data *vdata = (struct vacuum_data *)private_data;
-
-	if (vdata->vacuum) {
-		uint32_t hash = ctdb_hash(&key);
-		struct delete_record_data *kd;
-		/*
-		 * check if we can ignore this record because it's in the delete_list
-		 */
-		kd = (struct delete_record_data *)trbt_lookup32(vdata->delete_list, hash);
-		/*
-		 * there might be hash collisions so we have to compare the keys here to be sure
-		 */
-		if (kd && kd->key.dsize == key.dsize && memcmp(kd->key.dptr, key.dptr, key.dsize) == 0) {
-			struct ctdb_ltdb_header *hdr = (struct ctdb_ltdb_header *)data.dptr;
-			/*
-			 * we have to check if the record hasn't changed in the meantime in order to
-			 * savely remove it from the database
-			 */
-			if (data.dsize == sizeof(struct ctdb_ltdb_header) &&
-				hdr->dmaster == kd->ctdb->pnn &&
-				ctdb_lmaster(kd->ctdb, &(kd->key)) == kd->ctdb->pnn &&
-				kd->hdr.rsn == hdr->rsn) {
-				vdata->count.repack.vacuumed++;
-				return 0;
-			}
-		}
-	}
-	if (tdb_store(vdata->dest_db, key, data, TDB_INSERT) != 0) {
-		vdata->traverse_error = true;
-		return -1;
-	}
-	vdata->count.repack.copied++;
-	return 0;
-}
-
-/*
- * repack a tdb
- */
-static int ctdb_repack_tdb(struct tdb_context *tdb, TALLOC_CTX *mem_ctx, struct vacuum_data *vdata)
-{
-	struct tdb_context *tmp_db;
-
-	if (tdb_transaction_start(tdb) != 0) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to start transaction\n"));
-		return -1;
-	}
-
-	tmp_db = tdb_open("tmpdb", tdb_hash_size(tdb),
-			  TDB_INTERNAL|TDB_DISALLOW_NESTING,
-			  O_RDWR|O_CREAT, 0);
-	if (tmp_db == NULL) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to create tmp_db\n"));
-		tdb_transaction_cancel(tdb);
-		return -1;
-	}
-
-	vdata->traverse_error = false;
-	vdata->dest_db = tmp_db;
-	vdata->vacuum = true;
-	vdata->count.repack.vacuumed = 0;
-	vdata->count.repack.copied = 0;
-
-	/*
-	 * repack and vacuum on-the-fly by not writing the records that are
-	 * no longer needed
-	 */
-	if (tdb_traverse_read(tdb, repack_traverse, vdata) == -1) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to traverse copying out\n"));
-		tdb_transaction_cancel(tdb);
-		tdb_close(tmp_db);
-		return -1;		
-	}
-
-	DEBUG(DEBUG_INFO,(__location__ " %u records vacuumed\n",
-	      vdata->count.repack.vacuumed));
-	
-	if (vdata->traverse_error) {
-		DEBUG(DEBUG_ERR,(__location__ " Error during traversal\n"));
-		tdb_transaction_cancel(tdb);
-		tdb_close(tmp_db);
-		return -1;
-	}
-
-	if (tdb_wipe_all(tdb) != 0) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to wipe database\n"));
-		tdb_transaction_cancel(tdb);
-		tdb_close(tmp_db);
-		return -1;
-	}
-
-	vdata->traverse_error = false;
-	vdata->dest_db = tdb;
-	vdata->vacuum = false;
-	vdata->count.repack.copied = 0;
-
-	if (tdb_traverse_read(tmp_db, repack_traverse, vdata) == -1) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to traverse copying back\n"));
-		tdb_transaction_cancel(tdb);
-		tdb_close(tmp_db);
-		return -1;		
-	}
-
-	if (vdata->traverse_error) {
-		DEBUG(DEBUG_ERR,(__location__ " Error during second traversal\n"));
-		tdb_transaction_cancel(tdb);
-		tdb_close(tmp_db);
-		return -1;
-	}
-
-	tdb_close(tmp_db);
-
-
-	if (tdb_transaction_commit(tdb) != 0) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to commit\n"));
-		return -1;
-	}
-	DEBUG(DEBUG_INFO,(__location__ " %u records copied\n",
-	      vdata->count.repack.copied));
-
-	return 0;
-}
-
 /*
  * repack and vaccum a db
  * called from the child context
-- 
1.9.1


From bbe0ae2cfa8579efd8dc91784b60a20523f35ea0 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 02:57:25 +0200
Subject: [PATCH 05/13] ctdb:vacuum: remove a superfluous and misleading
 comment

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 2c78de7..942d611 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1509,9 +1509,6 @@ ctdb_vacuum_event(struct event_context *ev, struct timed_event *te,
 			_exit(1);
 		}
 
-		/* 
-		 * repack the db
-		 */
 		if ((ctdb->tunable.vacuum_fast_path_count > 0) &&
 		    (vacuum_handle->fast_path_count == 0))
 		{
-- 
1.9.1


From 3b5a8055a32790b65d5d522977c5ae7120cb0bde Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 02:59:51 +0200
Subject: [PATCH 06/13] ctdb:vacuum: untangle assignmend and check for return
 of tdb_repack()

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 942d611..64cb135 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1307,6 +1307,7 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	const char *name = ctdb_db->db_name;
 	int freelist_size = 0;
 	struct vacuum_data *vdata;
+	int ret;
 
 	vdata = talloc_zero(mem_ctx, struct vacuum_data);
 	if (vdata == NULL) {
@@ -1354,7 +1355,8 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	DEBUG(DEBUG_INFO, ("Repacking %s with %u freelist entries\n",
 			   name, freelist_size));
 
-	if (tdb_repack(ctdb_db->ltdb->tdb) != 0) {
+	ret = tdb_repack(ctdb_db->ltdb->tdb);
+	if (ret != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to repack '%s'\n", name));
 		talloc_free(vdata);
 		return -1;
-- 
1.9.1


From 93566c4825dc3b4c6b60443ea4ac260130d312cc Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 03:02:42 +0200
Subject: [PATCH 07/13] ctdb:vacuum: remove a superfluous comment.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 64cb135..8119acc 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1326,10 +1326,7 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	}
 
 	vdata->start = timeval_current();
- 
-	/*
-	 * gather all records that can be deleted in vdata
-	 */
+
 	if (ctdb_vacuum_db(ctdb_db, vdata, full_vacuum_run) != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to vacuum '%s'\n", name));
 	}
-- 
1.9.1


From 52eb74f48a4e7f4e9518fa6a4a8e58d74fe00bac Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 03:08:20 +0200
Subject: [PATCH 08/13] ctdb:vacuum: remove vacuum limit from vdata - not used

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 8119acc..cb0d993 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -53,7 +53,6 @@ struct ctdb_vacuum_handle {
 
 /*  a list of records to possibly delete */
 struct vacuum_data {
-	uint32_t repack_limit;
 	struct ctdb_context *ctdb;
 	struct ctdb_db_context *ctdb_db;
 	struct tdb_context *dest_db;
@@ -1316,7 +1315,6 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	}
 
 	vdata->ctdb = ctdb_db->ctdb;
-	vdata->repack_limit = repack_limit;
 	vdata->delete_list = trbt_create(vdata, 0);
 	vdata->ctdb_db = ctdb_db;
 	if (vdata->delete_list == NULL) {
-- 
1.9.1


From 838d01cd4dac9181c9c8f231bc5538b40af36b91 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 03:34:05 +0200
Subject: [PATCH 09/13] ctdb:vacuum: move init of vdata into init_vdata funcion

This is a small code cleanup.
vdata is only used in ctdb_vacuum_db() and not in
ctdb_vacuum_and_repack_db() where it is currently initialized.

This patch moves creation and all previously scattered
initialization of vacuum_data into ctdb_vacuum_init_vacuum_data
which is called from ctdb_vacuum_db.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 75 +++++++++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 32 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index cb0d993..46c8c93 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1170,11 +1170,29 @@ done:
 /**
  * initialize the vacuum_data
  */
-static int ctdb_vacuum_init_vacuum_data(struct ctdb_db_context *ctdb_db,
-					struct vacuum_data *vdata)
+static struct vacuum_data *ctdb_vacuum_init_vacuum_data(
+					struct ctdb_db_context *ctdb_db,
+					TALLOC_CTX *mem_ctx)
 {
 	int i;
 	struct ctdb_context *ctdb = ctdb_db->ctdb;
+	struct vacuum_data *vdata;
+
+	vdata = talloc_zero(mem_ctx, struct vacuum_data);
+	if (vdata == NULL) {
+		DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));
+		return NULL;
+	}
+
+	vdata->ctdb = ctdb_db->ctdb;
+	vdata->ctdb_db = ctdb_db;
+	vdata->delete_list = trbt_create(vdata, 0);
+	if (vdata->delete_list == NULL) {
+		DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));
+		goto fail;
+	}
+
+	vdata->start = timeval_current();
 
 	vdata->count.delete_queue.added_to_delete_list = 0;
 	vdata->count.delete_queue.added_to_vacuum_fetch_list = 0;
@@ -1199,7 +1217,7 @@ static int ctdb_vacuum_init_vacuum_data(struct ctdb_db_context *ctdb_db,
 						ctdb->num_nodes);
 	if (vdata->vacuum_fetch_list == NULL) {
 		DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));
-		return -1;
+		goto fail;
 	}
 	for (i = 0; i < ctdb->num_nodes; i++) {
 		vdata->vacuum_fetch_list[i] = (struct ctdb_marshall_buffer *)
@@ -1207,12 +1225,17 @@ static int ctdb_vacuum_init_vacuum_data(struct ctdb_db_context *ctdb_db,
 					 offsetof(struct ctdb_marshall_buffer, data));
 		if (vdata->vacuum_fetch_list[i] == NULL) {
 			DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));
-			return -1;
+			talloc_free(vdata);
+			return NULL;
 		}
 		vdata->vacuum_fetch_list[i]->db_id = ctdb_db->db_id;
 	}
 
-	return 0;
+	return vdata;
+
+fail:
+	talloc_free(vdata);
+	return NULL;
 }
 
 /**
@@ -1248,11 +1271,12 @@ static int ctdb_vacuum_init_vacuum_data(struct ctdb_db_context *ctdb_db,
  * This executes in the child context.
  */
 static int ctdb_vacuum_db(struct ctdb_db_context *ctdb_db,
-			  struct vacuum_data *vdata,
 			  bool full_vacuum_run)
 {
 	struct ctdb_context *ctdb = ctdb_db->ctdb;
 	int ret, pnn;
+	struct vacuum_data *vdata;
+	TALLOC_CTX *tmp_ctx;
 
 	DEBUG(DEBUG_INFO, (__location__ " Entering %s vacuum run for db "
 			   "%s db_id[0x%08x]\n",
@@ -1273,9 +1297,16 @@ static int ctdb_vacuum_db(struct ctdb_db_context *ctdb_db,
 
 	ctdb->pnn = pnn;
 
-	ret = ctdb_vacuum_init_vacuum_data(ctdb_db, vdata);
-	if (ret != 0) {
-		return ret;
+	tmp_ctx = talloc_new(ctdb_db);
+	if (tmp_ctx == NULL) {
+		DEBUG(DEBUG_ERR, ("Out of memory!\n"));
+		return -1;
+	}
+
+	vdata = ctdb_vacuum_init_vacuum_data(ctdb_db, tmp_ctx);
+	if (vdata == NULL) {
+		talloc_free(tmp_ctx);
+		return -1;
 	}
 
 	if (full_vacuum_run) {
@@ -1288,6 +1319,8 @@ static int ctdb_vacuum_db(struct ctdb_db_context *ctdb_db,
 
 	ctdb_process_delete_list(ctdb_db, vdata);
 
+	talloc_free(tmp_ctx);
+
 	/* this ensures we run our event queue */
 	ctdb_ctrl_getpnn(ctdb, TIMELIMIT(), CTDB_CURRENT_NODE);
 
@@ -1305,27 +1338,9 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	uint32_t repack_limit = ctdb_db->ctdb->tunable.repack_limit;
 	const char *name = ctdb_db->db_name;
 	int freelist_size = 0;
-	struct vacuum_data *vdata;
 	int ret;
 
-	vdata = talloc_zero(mem_ctx, struct vacuum_data);
-	if (vdata == NULL) {
-		DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));
-		return -1;
-	}
-
-	vdata->ctdb = ctdb_db->ctdb;
-	vdata->delete_list = trbt_create(vdata, 0);
-	vdata->ctdb_db = ctdb_db;
-	if (vdata->delete_list == NULL) {
-		DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));
-		talloc_free(vdata);
-		return -1;
-	}
-
-	vdata->start = timeval_current();
-
-	if (ctdb_vacuum_db(ctdb_db, vdata, full_vacuum_run) != 0) {
+	if (ctdb_vacuum_db(ctdb_db, full_vacuum_run) != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to vacuum '%s'\n", name));
 	}
 
@@ -1333,7 +1348,6 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 		freelist_size = tdb_freelist_size(ctdb_db->ltdb->tdb);
 		if (freelist_size == -1) {
 			DEBUG(DEBUG_ERR,(__location__ " Failed to get freelist size for '%s'\n", name));
-			talloc_free(vdata);
 			return -1;
 		}
 	}
@@ -1343,7 +1357,6 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	 */
 	if ((repack_limit == 0 || (uint32_t)freelist_size < repack_limit))
 	{
-		talloc_free(vdata);
 		return 0;
 	}
 
@@ -1353,10 +1366,8 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	ret = tdb_repack(ctdb_db->ltdb->tdb);
 	if (ret != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to repack '%s'\n", name));
-		talloc_free(vdata);
 		return -1;
 	}
-	talloc_free(vdata);
 
 	return 0;
 }
-- 
1.9.1


From 1aa68e5156cf029e80ad1bbd30f9a2680f61d679 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 03:36:49 +0200
Subject: [PATCH 10/13] ctdb:vacuum: remove now unused talloc ctx argument from
 ctdb_vacuum_db()

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 46c8c93..d1cb4de 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1332,7 +1332,6 @@ static int ctdb_vacuum_db(struct ctdb_db_context *ctdb_db,
  * called from the child context
  */
 static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
-				     TALLOC_CTX *mem_ctx,
 				     bool full_vacuum_run)
 {
 	uint32_t repack_limit = ctdb_db->ctdb->tunable.repack_limit;
@@ -1522,8 +1521,7 @@ ctdb_vacuum_event(struct event_context *ev, struct timed_event *te,
 		{
 			full_vacuum_run = true;
 		}
-		cc = ctdb_vacuum_and_repack_db(ctdb_db, child_ctx,
-					       full_vacuum_run);
+		cc = ctdb_vacuum_and_repack_db(ctdb_db, full_vacuum_run);
 
 		write(child_ctx->fd[1], &cc, 1);
 		_exit(0);
-- 
1.9.1


From d68dab3c16d3bce4757eeb1ca3cbfa1a9b43c83a Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 15 Feb 2014 01:36:06 +0100
Subject: [PATCH 11/13] ctdb:vacuum: always run freelist_size again

and not only if repack_limit != 0. This partially reverts
commit 48f2d1158820bfb063ba0a0bbfb6f496a8e7522.

This is in preparation of running a different function with
defragmentation side effect instead of freelist_size.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index d1cb4de..ca2e4d9 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1343,12 +1343,10 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 		DEBUG(DEBUG_ERR,(__location__ " Failed to vacuum '%s'\n", name));
 	}
 
-	if (repack_limit != 0) {
-		freelist_size = tdb_freelist_size(ctdb_db->ltdb->tdb);
-		if (freelist_size == -1) {
-			DEBUG(DEBUG_ERR,(__location__ " Failed to get freelist size for '%s'\n", name));
-			return -1;
-		}
+	freelist_size = tdb_freelist_size(ctdb_db->ltdb->tdb);
+	if (freelist_size == -1) {
+		DEBUG(DEBUG_ERR,(__location__ " Failed to get freelist size for '%s'\n", name));
+		return -1;
 	}
 
 	/*
-- 
1.9.1


From 77f4b993ff17d4768112045ddde50df734443c2d Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 02:49:23 +0100
Subject: [PATCH 12/13] ctdb:vacuum: run tdb_freelist_merge_adjacent() instead
 of tdb_freelist_size()

This way, the freelist is compacted in each vacuum run,
and the need to run repack is reduced.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index ca2e4d9..d1ee7fd 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1337,15 +1337,18 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	uint32_t repack_limit = ctdb_db->ctdb->tunable.repack_limit;
 	const char *name = ctdb_db->db_name;
 	int freelist_size = 0;
+	int freelist_merged;
 	int ret;
 
 	if (ctdb_vacuum_db(ctdb_db, full_vacuum_run) != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to vacuum '%s'\n", name));
 	}
 
-	freelist_size = tdb_freelist_size(ctdb_db->ltdb->tdb);
-	if (freelist_size == -1) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to get freelist size for '%s'\n", name));
+	ret = tdb_freelist_merge_adjacent(ctdb_db->ltdb->tdb, &freelist_size,
+					  &freelist_merged);
+	if (ret != 0) {
+		DEBUG(DEBUG_ERR, (__location__ " tdb_freelist_merge_adjacent "
+				  "failed for '%s'\n", name));
 		return -1;
 	}
 
-- 
1.9.1


From 769faf1f3cc7cb2352cf93e5342e6560f9bdf1b9 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Tue, 22 Apr 2014 22:09:35 +0200
Subject: [PATCH 13/13] ctdb:vacuum: add missing return to
 ctdb_vacuum_traverse_db() error path.

This got lost in commit 19948702992c94553e1a611540ad398de9f9d8b9
("ctdb-vacuum: make ctdb_vacuum_traverse_db() void.")

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index d1ee7fd..4e5e71f 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -744,6 +744,7 @@ static void ctdb_vacuum_traverse_db(struct ctdb_db_context *ctdb_db,
 	if (ret == -1 || vdata->traverse_error) {
 		DEBUG(DEBUG_ERR, (__location__ " Traverse error in vacuuming "
 				  "'%s'\n", ctdb_db->db_name));
+		return;
 	}
 
 	if (vdata->count.db_traverse.total > 0) {
-- 
1.9.1

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20140611/535448f4/attachment.pgp>


More information about the samba-technical mailing list