[PATCHES] freelist defragmentation and (dependent) ctdb repacking

Michael Adam obnox at samba.org
Mon Jun 16 10:09:07 MDT 2014


after a short discussion with Volker, here are two updated patchsets:

The tdb patchset:

- does not publish the tdb_freelist_merge_adjacend() function
- tdb_freelist_size() calls tdb_freelist_merge_adjacend() if
  the db is not read only, otherwise operates as before.
- Hence no version bump is needed
- the tdbtool gets a new subcommand "freelist_size" which
  calls and prints the freelist size, thereby defragmenting
  the freelist, but there is no subcommand "merge_adjacent" any more.
- The tdb_allocate_from_freelist() code got an extra twist
  that can help expansions of the db:
  When a new candidate was created by a merge but no candidate
  was found in the first walk, we do a second walk without
  before expanding.

The ctdb patchset:

- does obviously not call tdb_freelist_merge_adjacent any more
  and is hence an in principle unrelated cleanup, but it has
  the side effect of defragmenting the freelist when run with
  current tdb code, by calling tdb_freelist_size() even when
  repack_limit is set to 0.

Cheers - Michael

On 2014-06-16 at 13:41 +0200, Michael Adam wrote:
> Any more comments?
> 
> For Amitay's concern, there are two (gradual) options
> (if the changes are generally found to be worthwile):
> 
> 1. don't publish freelist_merge_adjacent at all and only
>    have freelist_size call it.
>    Hence no tdb version bump and no tdbtool command "merge"
>    (unless we use tdb_freelist_size)
> 2. publish freelist_merge_adjacent, and hence bump the tdb
>    version number but also have freelist_size call
>    merge_adjacent, so that we can have ctdb benefit
>    from the defragmentation without requiring a newer
>    version of tdb.
> 
> Cheers - Michael
> 
> On 2014-06-13 at 09:30 +0200, Michael Adam wrote:
> > On 2014-06-13 at 17:10 +1000, Amitay Isaacs wrote:
> > > >
> > > 
> > > Hi Michael,
> > > 
> > > The changes look good, but haven't finished reviewing them yet.
> > 
> > Thanks so far!
> > 
> > > One of the issue is that the ctdb changes are not compatible with older tdb
> > > version.  Are we making a decision to drop the compatibility with older
> > > versions of tdb?
> > 
> > Well I am not suggesting any such decision here.
> > Therefore, we can omit the final patch to ctdb
> > and simply have ctdb call tdb_freelist_size: That is
> > the first approach, where tdb_freelist_size is
> > changed to call tdb_freelist_merge_adjacend()
> > so that the API does not change, but a defragmenting
> > side-effect is added.
> > 
> > That's why I explicitly put it this way:
> > 
> > > On Thu, Jun 12, 2014 at 1:43 AM, Michael Adam <obnox at samba.org> wrote:
> > > >
> > > > - Because we might not want to bump the version to 1.3.1 for
> > > >   this (not sure..), or as a proposal, I have also changed
> > > >   tdb_freelist_size() to call tdb_freelist_merge_adjacent()
> > > >   so that it automatically defragments.
> > > >
> > > > - Secondly, there is a place when we traverse the freelist
> > > >   anyways, namely in tdb_allocate_from_freelist(). I have
> > > >   changed our loop there to merge with left freelist records,
> > > >   thereby automatically reducing the freelist fragmentation
> > > >   as the database is used. This will usually not traverse until
> > > >   the end though since the bes fit algorithm works with decreasing
> > > >   accuracy.
> > > >
> > > > - I probably owe a test to measure the effect?!
> > > >
> > > > The second patchset is for ctdb and builds upon the first one.
> > > >
> > > > - It changes the vacuum code to always call tdb_freelist_size()
> > > >   again before checking whether a repack run is needed, so that
> > > >   it automatically defragments the freelist.
> > > >   This might reduce the frequency of (blocking) repacks.
> > > >
> > > > - As a variant, on top there is a patch to explicitly call
> > > >   tdb_freelist_merge_adjacent() instead of freelist_size().
> > > >   If we choose to have freelist_size defragment, we don't
> > > >   need this change, and it is also more backward compatible.
> > 
> > Is that the more reasonable approach?
> > 
> > Cheers - Michael
> 
> 


-------------- next part --------------
From 6a6b527564f2d4c9c82ad4223eb87b5d682903fd Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 01:31:50 +0100
Subject: [PATCH 01/13] tdb: factor read_record_on_left() out of tdb_free()

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 79 +++++++++++++++++++++++++++++------------------
 1 file changed, 49 insertions(+), 30 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 2aeeb1c..e6f7db5 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -97,10 +97,58 @@ static int update_tailer(struct tdb_context *tdb, tdb_off_t offset,
 			 &totalsize);
 }
 
+/**
+ * Read the record directly on the left.
+ * Fail if there is no record on the left.
+ */
+static int read_record_on_left(struct tdb_context *tdb, tdb_off_t offset,
+			       tdb_off_t *left_p,
+			       struct tdb_record *left_r)
+{
+	tdb_off_t left = offset - sizeof(tdb_off_t);
+	struct tdb_record l;
+	tdb_off_t leftsize;
+
+	if (offset - sizeof(tdb_off_t) > TDB_DATA_START(tdb->hash_size)) {
+
+		/* Read in tailer and jump back to header */
+		if (tdb_ofs_read(tdb, left, &leftsize) == -1) {
+			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: left offset read failed at %u\n", left));
+			return -1;
+		}
+
+		/* it could be uninitialised data */
+		if (leftsize == 0 || leftsize == TDB_PAD_U32) {
+			return -1;
+		}
+
+		left = offset - leftsize;
+
+		if (leftsize > offset ||
+		    left < TDB_DATA_START(tdb->hash_size)) {
+			return -1;
+		}
+
+		/* Now read in the left record */
+		if (tdb->methods->tdb_read(tdb, left, &l, sizeof(l), DOCONV()) == -1) {
+			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: left read failed at %u (%u)\n", left, leftsize));
+			return -1;
+		}
+
+		*left_p = left;
+		*left_r = l;
+	}
+
+	return -1;
+}
+
 /* Add an element into the freelist. Merge adjacent records if
    necessary. */
 int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
 {
+	tdb_off_t left;
+	struct tdb_record l;
+
 	/* Allocation and tailer lock */
 	if (tdb_lock(tdb, -1, F_WRLCK) != 0)
 		return -1;
@@ -138,36 +186,7 @@ int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
 left:
 #endif
 
-	/* Look left */
-	if (offset - sizeof(tdb_off_t) > TDB_DATA_START(tdb->hash_size)) {
-		tdb_off_t left = offset - sizeof(tdb_off_t);
-		struct tdb_record l;
-		tdb_off_t leftsize;
-
-		/* Read in tailer and jump back to header */
-		if (tdb_ofs_read(tdb, left, &leftsize) == -1) {
-			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: left offset read failed at %u\n", left));
-			goto update;
-		}
-
-		/* it could be uninitialised data */
-		if (leftsize == 0 || leftsize == TDB_PAD_U32) {
-			goto update;
-		}
-
-		left = offset - leftsize;
-
-		if (leftsize > offset ||
-		    left < TDB_DATA_START(tdb->hash_size)) {
-			goto update;
-		}
-
-		/* Now read in the left record */
-		if (tdb->methods->tdb_read(tdb, left, &l, sizeof(l), DOCONV()) == -1) {
-			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: left read failed at %u (%u)\n", left, leftsize));
-			goto update;
-		}
-
+	if (read_record_on_left(tdb, offset, &left, &l) == 0) {
 		/* If it's free, expand to include it. */
 		if (l.magic == TDB_FREE_MAGIC) {
 			/* we now merge the new record into the left record, rather than the other
-- 
1.9.1


From fd5a979553c21a41b76bd96b45785b8d3283cca7 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 01:39:15 +0100
Subject: [PATCH 02/13] tdb: increase readability of read_record_on_left()

by using early returns and better variable names,
and reducing indentation.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 68 ++++++++++++++++++++++++++++-------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index e6f7db5..4a48c3d 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -101,45 +101,59 @@ static int update_tailer(struct tdb_context *tdb, tdb_off_t offset,
  * Read the record directly on the left.
  * Fail if there is no record on the left.
  */
-static int read_record_on_left(struct tdb_context *tdb, tdb_off_t offset,
+static int read_record_on_left(struct tdb_context *tdb, tdb_off_t rec_ptr,
 			       tdb_off_t *left_p,
 			       struct tdb_record *left_r)
 {
-	tdb_off_t left = offset - sizeof(tdb_off_t);
-	struct tdb_record l;
-	tdb_off_t leftsize;
+	tdb_off_t left_ptr;
+	tdb_off_t left_size;
+	struct tdb_record left_rec;
+	int ret;
 
-	if (offset - sizeof(tdb_off_t) > TDB_DATA_START(tdb->hash_size)) {
+	left_ptr = rec_ptr - sizeof(tdb_off_t);
 
-		/* Read in tailer and jump back to header */
-		if (tdb_ofs_read(tdb, left, &leftsize) == -1) {
-			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: left offset read failed at %u\n", left));
-			return -1;
-		}
+	if (left_ptr <= TDB_DATA_START(tdb->hash_size)) {
+		/* no record on the left */
+		return -1;
+	}
 
-		/* it could be uninitialised data */
-		if (leftsize == 0 || leftsize == TDB_PAD_U32) {
-			return -1;
-		}
+	/* Read in tailer and jump back to header */
+	ret = tdb_ofs_read(tdb, left_ptr, &left_size);
+	if (ret == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL,
+			"tdb_free: left offset read failed at %u\n", left_ptr));
+		return -1;
+	}
 
-		left = offset - leftsize;
+	/* it could be uninitialised data */
+	if (left_size == 0 || left_size == TDB_PAD_U32) {
+		return -1;
+	}
 
-		if (leftsize > offset ||
-		    left < TDB_DATA_START(tdb->hash_size)) {
-			return -1;
-		}
+	if (left_size > rec_ptr) {
+		return -1;
+	}
 
-		/* Now read in the left record */
-		if (tdb->methods->tdb_read(tdb, left, &l, sizeof(l), DOCONV()) == -1) {
-			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: left read failed at %u (%u)\n", left, leftsize));
-			return -1;
-		}
+	left_ptr = rec_ptr - left_size;
+
+	if (left_ptr < TDB_DATA_START(tdb->hash_size)) {
+		return -1;
+	}
 
-		*left_p = left;
-		*left_r = l;
+	/* Now read in the left record */
+	ret = tdb->methods->tdb_read(tdb, left_ptr, &left_rec,
+				     sizeof(left_rec), DOCONV());
+	if (ret == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL,
+			 "tdb_free: left read failed at %u (%u)\n",
+			 left_ptr, left_size));
+		return -1;
 	}
 
-	return -1;
+	*left_p = left_ptr;
+	*left_r = left_rec;
+
+	return 0;
 }
 
 /* Add an element into the freelist. Merge adjacent records if
-- 
1.9.1


From fc322d82e1b551f2ec746e6ce7e9056e34974e7c Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 01:46:42 +0100
Subject: [PATCH 03/13] tdb: reduce indentation in tdb_free() for merging left

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 4a48c3d..acc9f13 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -200,26 +200,31 @@ int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
 left:
 #endif
 
-	if (read_record_on_left(tdb, offset, &left, &l) == 0) {
-		/* If it's free, expand to include it. */
-		if (l.magic == TDB_FREE_MAGIC) {
-			/* we now merge the new record into the left record, rather than the other
-			   way around. This makes the operation O(1) instead of O(n). This change
-			   prevents traverse from being O(n^2) after a lot of deletes */
-			l.rec_len += sizeof(*rec) + rec->rec_len;
-			if (tdb_rec_write(tdb, left, &l) == -1) {
-				TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_left failed at %u\n", left));
-				goto fail;
-			}
-			if (update_tailer(tdb, left, &l) == -1) {
-				TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_tailer failed at %u\n", offset));
-				goto fail;
-			}
-			tdb_unlock(tdb, -1, F_WRLCK);
-			return 0;
-		}
+	if (read_record_on_left(tdb, offset, &left, &l) != 0) {
+		goto update;
 	}
 
+	if (l.magic != TDB_FREE_MAGIC) {
+		goto update;
+	}
+
+	/* It's free - expand to include it. */
+
+	/* we now merge the new record into the left record, rather than the other
+	   way around. This makes the operation O(1) instead of O(n). This change
+	   prevents traverse from being O(n^2) after a lot of deletes */
+	l.rec_len += sizeof(*rec) + rec->rec_len;
+	if (tdb_rec_write(tdb, left, &l) == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_left failed at %u\n", left));
+		goto fail;
+	}
+	if (update_tailer(tdb, left, &l) == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_tailer failed at %u\n", offset));
+		goto fail;
+	}
+	tdb_unlock(tdb, -1, F_WRLCK);
+	return 0;
+
 update:
 
 	/* Now, prepend to free list */
-- 
1.9.1


From 0c78a329429c0afce536d611a6e61fbe13c7ca6b Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 01:51:39 +0100
Subject: [PATCH 04/13] tdb: fix debug message in tdb_free()

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index acc9f13..ffd4bcd 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -219,7 +219,7 @@ left:
 		goto fail;
 	}
 	if (update_tailer(tdb, left, &l) == -1) {
-		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_tailer failed at %u\n", offset));
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_tailer failed at %u\n", left));
 		goto fail;
 	}
 	tdb_unlock(tdb, -1, F_WRLCK);
-- 
1.9.1


From 8eae18386fc56f1ae4c690efe0aca52c218aac40 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 02:01:38 +0100
Subject: [PATCH 05/13] tdb: factor merge_with_left_record() out of tdb_free()

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 44 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index ffd4bcd..1bea88e 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -156,6 +156,40 @@ static int read_record_on_left(struct tdb_context *tdb, tdb_off_t rec_ptr,
 	return 0;
 }
 
+/**
+ * Merge new freelist record with the direct left neighbour.
+ * This assumes that left_rec represents the record
+ * directly to the left of right_rec and that this is
+ * a freelist record.
+ */
+static int merge_with_left_record(struct tdb_context *tdb,
+				  tdb_off_t left_ptr,
+				  struct tdb_record *left_rec,
+				  struct tdb_record *right_rec)
+{
+	int ret;
+
+	left_rec->rec_len += sizeof(*right_rec) + right_rec->rec_len;
+
+	ret = tdb_rec_write(tdb, left_ptr, left_rec);
+	if (ret == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL,
+			 "merge_with_left_record: update_left failed at %u\n",
+			 left_ptr));
+		return -1;
+	}
+
+	ret = update_tailer(tdb, left_ptr, left_rec);
+	if (ret == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL,
+			 "merge_with_left_record: update_tailer failed at %u\n",
+			 left_ptr));
+		return -1;
+	}
+
+	return 0;
+}
+
 /* Add an element into the freelist. Merge adjacent records if
    necessary. */
 int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
@@ -213,15 +247,11 @@ left:
 	/* we now merge the new record into the left record, rather than the other
 	   way around. This makes the operation O(1) instead of O(n). This change
 	   prevents traverse from being O(n^2) after a lot of deletes */
-	l.rec_len += sizeof(*rec) + rec->rec_len;
-	if (tdb_rec_write(tdb, left, &l) == -1) {
-		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_left failed at %u\n", left));
-		goto fail;
-	}
-	if (update_tailer(tdb, left, &l) == -1) {
-		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_free: update_tailer failed at %u\n", left));
+
+	if (merge_with_left_record(tdb, left, &l, rec) != 0) {
 		goto fail;
 	}
+
 	tdb_unlock(tdb, -1, F_WRLCK);
 	return 0;
 
-- 
1.9.1


From 9873a3f193dfd630e1b5bcb20e64cb9cae206853 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 01:49:44 +0200
Subject: [PATCH 06/13] tdb: improve comments for tdb_free().

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 1bea88e..41986b9 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -190,8 +190,17 @@ static int merge_with_left_record(struct tdb_context *tdb,
 	return 0;
 }
 
-/* Add an element into the freelist. Merge adjacent records if
-   necessary. */
+/**
+ * Add an element into the freelist.
+ *
+ * We merge the new record into the left record if it is also a
+ * free record, but not with the right one. This makes the
+ * operation O(1) instead of O(n): merging with the right record
+ * requires a traverse of the freelist to find the previous
+ * record in the free list.
+ *
+ * This prevents db traverses from being O(n^2) after a lot of deletes.
+ */
 int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
 {
 	tdb_off_t left;
@@ -243,11 +252,6 @@ left:
 	}
 
 	/* It's free - expand to include it. */
-
-	/* we now merge the new record into the left record, rather than the other
-	   way around. This makes the operation O(1) instead of O(n). This change
-	   prevents traverse from being O(n^2) after a lot of deletes */
-
 	if (merge_with_left_record(tdb, left, &l, rec) != 0) {
 		goto fail;
 	}
-- 
1.9.1


From 9cc3e5582add2fc1fdb26731db2cc9bd82f01f12 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Wed, 11 Jun 2014 12:00:48 +0200
Subject: [PATCH 07/13] tdb: add utility function
 check_merge_with_left_record()

Check whether the record left of a given freelist record is
also a freelist record, and if so, merge the two records.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 41986b9..69b3c66 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -191,6 +191,56 @@ static int merge_with_left_record(struct tdb_context *tdb,
 }
 
 /**
+ * Check whether the record left of a given freelist record is
+ * also a freelist record, and if so, merge the two records.
+ *
+ * Return code:
+ *  -1 upon error
+ *   0 if left was not a free record
+ *   1 if left was free and successfully merged.
+ *
+ * The currend record is handed in with pointer and fully read record.
+ *
+ * The left record pointer and struct can be retrieved as result
+ * in lp and lr;
+ */
+static int check_merge_with_left_record(struct tdb_context *tdb,
+					tdb_off_t rec_ptr,
+					struct tdb_record *rec,
+					tdb_off_t *lp,
+					struct tdb_record *lr)
+{
+	tdb_off_t left_ptr;
+	struct tdb_record left_rec;
+	int ret;
+
+	ret = read_record_on_left(tdb, rec_ptr, &left_ptr, &left_rec);
+	if (ret != 0) {
+		return 0;
+	}
+
+	if (left_rec.magic != TDB_FREE_MAGIC) {
+		return 0;
+	}
+
+	/* It's free - expand to include it. */
+	ret = merge_with_left_record(tdb, left_ptr, &left_rec, rec);
+	if (ret != 0) {
+		return -1;
+	}
+
+	if (lp != NULL) {
+		*lp = left_ptr;
+	}
+
+	if (lr != NULL) {
+		*lr = left_rec;
+	}
+
+	return 1;
+}
+
+/**
  * Add an element into the freelist.
  *
  * We merge the new record into the left record if it is also a
-- 
1.9.1


From 91a723622f0fa4fece74d595a4103e6894ffbcfa Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Wed, 11 Jun 2014 12:03:01 +0200
Subject: [PATCH 08/13] tdb: simplify tdb_free() using
 check_merge_with_left_record()

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 69b3c66..47d6301 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -253,8 +253,7 @@ static int check_merge_with_left_record(struct tdb_context *tdb,
  */
 int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
 {
-	tdb_off_t left;
-	struct tdb_record l;
+	int ret;
 
 	/* Allocation and tailer lock */
 	if (tdb_lock(tdb, -1, F_WRLCK) != 0)
@@ -293,25 +292,17 @@ int tdb_free(struct tdb_context *tdb, tdb_off_t offset, struct tdb_record *rec)
 left:
 #endif
 
-	if (read_record_on_left(tdb, offset, &left, &l) != 0) {
-		goto update;
-	}
-
-	if (l.magic != TDB_FREE_MAGIC) {
-		goto update;
-	}
-
-	/* It's free - expand to include it. */
-	if (merge_with_left_record(tdb, left, &l, rec) != 0) {
+	ret = check_merge_with_left_record(tdb, offset, rec, NULL, NULL);
+	if (ret == -1) {
 		goto fail;
 	}
+	if (ret == 1) {
+		/* merged */
+		goto done;
+	}
 
-	tdb_unlock(tdb, -1, F_WRLCK);
-	return 0;
-
-update:
+	/* Nothing to merge, prepend to free list */
 
-	/* Now, prepend to free list */
 	rec->magic = TDB_FREE_MAGIC;
 
 	if (tdb_ofs_read(tdb, FREELIST_TOP, &rec->next) == -1 ||
@@ -321,6 +312,7 @@ update:
 		goto fail;
 	}
 
+done:
 	/* And we're done. */
 	tdb_unlock(tdb, -1, F_WRLCK);
 	return 0;
-- 
1.9.1


From e5e00a3775850ec3e293b20a55d3115921c99873 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Wed, 11 Jun 2014 12:04:01 +0200
Subject: [PATCH 09/13] tdb: add utility function
 check_merge_ptr_with_left_record()

Variant of check_merge_with_left_record() that reads the record
itself if necessary.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 47d6301..9b50875 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -241,6 +241,59 @@ static int check_merge_with_left_record(struct tdb_context *tdb,
 }
 
 /**
+ * Check whether the record left of a given freelist record is
+ * also a freelist record, and if so, merge the two records.
+ *
+ * Return code:
+ *  -1 upon error
+ *   0 if left was not a free record
+ *   1 if left was free and successfully merged.
+ *
+ * In this variant, the input record is specified just as the pointer
+ * and is read from the database if needed.
+ *
+ * next_ptr will contain the original record's next pointer after
+ * successful merging (which will be lost after merging), so that
+ * the caller can update the last pointer.
+ */
+static int check_merge_ptr_with_left_record(struct tdb_context *tdb,
+					    tdb_off_t rec_ptr,
+					    tdb_off_t *next_ptr)
+{
+	tdb_off_t left_ptr;
+	struct tdb_record rec, left_rec;
+	int ret;
+
+	ret = read_record_on_left(tdb, rec_ptr, &left_ptr, &left_rec);
+	if (ret != 0) {
+		return 0;
+	}
+
+	if (left_rec.magic != TDB_FREE_MAGIC) {
+		return 0;
+	}
+
+	/* It's free - expand to include it. */
+
+	ret = tdb->methods->tdb_read(tdb, rec_ptr, &rec,
+				     sizeof(rec), DOCONV());
+	if (ret != 0) {
+		return -1;
+	}
+
+	ret = merge_with_left_record(tdb, left_ptr, &left_rec, &rec);
+	if (ret != 0) {
+		return -1;
+	}
+
+	if (next_ptr != NULL) {
+		*next_ptr = rec.next;
+	}
+
+	return 1;
+}
+
+/**
  * Add an element into the freelist.
  *
  * We merge the new record into the left record if it is also a
-- 
1.9.1


From d6c25522d02b17b40143bb15deb88b1bd54c47be Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 02:39:09 +0100
Subject: [PATCH 10/13] tdb: add tdb_freelist_merge_adjacent()

This is intended to be called to reduce the fragmentation in the
freelist. This is to make up the deficiency of the freelist
to be not doubly linked. If the freelist were doubly linked,
we could easily avoid the creation of adjacent freelist entries.
But with the current singly linked list, it is only possible
to cheaply merge a new free record into a freelist entry on the left,
not on the right...

This can be called periodically, e.g. in the vacuuming process
of a ctdb cluster.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 9b50875..89ea371 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -611,6 +611,65 @@ blocking_freelist_allocate:
 	return ret;
 }
 
+/**
+ * Merge adjacent records in the freelist.
+ */
+static int tdb_freelist_merge_adjacent(struct tdb_context *tdb,
+				       int *count_records, int *count_merged)
+{
+	tdb_off_t cur, next;
+	int count = 0;
+	int merged = 0;
+	int ret;
+
+	ret = tdb_lock(tdb, -1, F_RDLCK);
+	if (ret == -1) {
+		return -1;
+	}
+
+	cur = FREELIST_TOP;
+	while (tdb_ofs_read(tdb, cur, &next) == 0 && next != 0) {
+		tdb_off_t next2;
+
+		count++;
+
+		ret = check_merge_ptr_with_left_record(tdb, next, &next2);
+		if (ret == -1) {
+			goto done;
+		}
+		if (ret == 1) {
+			/*
+			 * merged:
+			 * now let cur->next point to next2 instead of next
+			 */
+
+			ret = tdb_ofs_write(tdb, cur, &next2);
+			if (ret != 0) {
+				goto done;
+			}
+
+			next = next2;
+			merged++;
+		}
+
+		cur = next;
+	}
+
+	if (count_records != NULL) {
+		*count_records = count;
+	}
+
+	if (count_merged != NULL) {
+		*count_merged = merged;
+	}
+
+	ret = 0;
+
+done:
+	tdb_unlock(tdb, -1, F_RDLCK);
+	return ret;
+}
+
 /*
    return the size of the freelist - used to decide if we should repack
 */
-- 
1.9.1


From 5e97c3917d9adc37b6e3175c3f47fba129e678d3 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Wed, 11 Jun 2014 17:26:51 +0200
Subject: [PATCH 11/13] tdb: use tdb_freelist_merge_adjacet in
 tdb_freelist_size()

So that we automatically defragment the free list when freelist_size is called
(unless the database is read only).

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 89ea371..18b5bf1 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -670,10 +670,10 @@ done:
 	return ret;
 }
 
-/*
-   return the size of the freelist - used to decide if we should repack
-*/
-_PUBLIC_ int tdb_freelist_size(struct tdb_context *tdb)
+/**
+ * return the size of the freelist - no merging done
+ */
+static int tdb_freelist_size_no_merge(struct tdb_context *tdb)
 {
 	tdb_off_t ptr;
 	int count=0;
@@ -690,3 +690,28 @@ _PUBLIC_ int tdb_freelist_size(struct tdb_context *tdb)
 	tdb_unlock(tdb, -1, F_RDLCK);
 	return count;
 }
+
+/**
+ * return the size of the freelist - used to decide if we should repack
+ *
+ * As a side effect, adjacent records are merged unless the
+ * database is read-only, in order to reduce the fragmentation
+ * without repacking.
+ */
+_PUBLIC_ int tdb_freelist_size(struct tdb_context *tdb)
+{
+
+	int count = 0;
+
+	if (tdb->read_only) {
+		count = tdb_freelist_size_no_merge(tdb);
+	} else {
+		int ret;
+		ret = tdb_freelist_merge_adjacent(tdb, &count, NULL);
+		if (ret != 0) {
+			return -1;
+		}
+	}
+
+	return count;
+}
-- 
1.9.1


From d0d65e698e5e905b6af561c6539f190e96b6baab Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Mon, 10 Feb 2014 07:39:31 +0100
Subject: [PATCH 12/13] tdb: add "freelist_size" sub-command to tdbtool

With the new code, this has the side effect of
merging adjacent records in the freelist if the
database is not read-only.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/tools/tdbtool.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/lib/tdb/tools/tdbtool.c b/lib/tdb/tools/tdbtool.c
index 2f93e33..780782b 100644
--- a/lib/tdb/tools/tdbtool.c
+++ b/lib/tdb/tools/tdbtool.c
@@ -55,6 +55,7 @@ enum commands {
 	CMD_DELETE,
 	CMD_LIST_HASH_FREE,
 	CMD_LIST_FREE,
+	CMD_FREELIST_SIZE,
 	CMD_INFO,
 	CMD_MMAP,
 	CMD_SPEED,
@@ -89,6 +90,7 @@ COMMAND_TABLE cmd_table[] = {
 	{"delete",	CMD_DELETE},
 	{"list",	CMD_LIST_HASH_FREE},
 	{"free",	CMD_LIST_FREE},
+	{"freelist_size",	CMD_FREELIST_SIZE},
 	{"info",	CMD_INFO},
 	{"speed",	CMD_SPEED},
 	{"mmap",	CMD_MMAP},
@@ -232,6 +234,7 @@ static void help(void)
 "  delete    key        : delete a record by key\n"
 "  list                 : print the database hash table and freelist\n"
 "  free                 : print the database freelist\n"
+"  freelist_size        : print the number of records in the freelist\n"
 "  check                : check the integrity of an opened database\n"
 "  repack               : repack the database\n"
 "  speed                : perform speed tests on the database\n"
@@ -710,6 +713,18 @@ static int do_command(void)
 		case CMD_LIST_FREE:
 			tdb_printfreelist(tdb);
 			return 0;
+		case CMD_FREELIST_SIZE: {
+			int count;
+
+			count = tdb_freelist_size(tdb);
+			if (count < 0) {
+				printf("Error getting freelist size.\n");
+			} else {
+				printf("freelist size: %d\n", count);
+			}
+
+			return 0;
+		}
 		case CMD_INFO:
 			info_tdb();
 			return 0;
-- 
1.9.1


From de465aad38ffdf6dffbaecc010ecadd68d4442de Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Wed, 11 Jun 2014 12:05:57 +0200
Subject: [PATCH 13/13] tdb: defragment the freelist in
 tdb_allocate_from_freelist()

While we are traversing the freelist anyways, merge a record
with the left if it is also a free list record.

That partially makes up for the fragmentation introduced by
the lack of merging with right records in tdb_free().

Note there is a potential slight downside:
If the left record we merge the current record into was earlier
in the chain and has hence already been met in traverse,
then we can not use the enlarged record even if it might be
a new best fit.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 lib/tdb/common/freelist.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 18b5bf1..3edb98a 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -449,6 +449,7 @@ static tdb_off_t tdb_allocate_from_freelist(
 		tdb_len_t rec_len;
 	} bestfit;
 	float multiplier = 1.0;
+	bool merge_created_candidate = false;
 
 	/* over-allocate to reduce fragmentation */
 	length *= 1.25;
@@ -458,6 +459,7 @@ static tdb_off_t tdb_allocate_from_freelist(
 	length = TDB_ALIGN(length, TDB_ALIGNMENT);
 
  again:
+	merge_created_candidate = false;
 	last_ptr = FREELIST_TOP;
 
 	/* read in the freelist top */
@@ -474,10 +476,59 @@ static tdb_off_t tdb_allocate_from_freelist(
 	   issues when faced with a slowly increasing record size.
 	 */
 	while (rec_ptr) {
+		int ret;
+		tdb_off_t left_ptr;
+		struct tdb_record left_rec;
+
 		if (tdb_rec_free_read(tdb, rec_ptr, rec) == -1) {
 			return 0;
 		}
 
+		ret = check_merge_with_left_record(tdb, rec_ptr, rec,
+						   &left_ptr, &left_rec);
+		if (ret == -1) {
+			return 0;
+		}
+		if (ret == 1) {
+			/* merged */
+			rec_ptr = rec->next;
+			ret = tdb_ofs_write(tdb, last_ptr, &rec->next);
+			if (ret == -1) {
+				return 0;
+			}
+
+			/*
+			 * We have merged the current record into the left
+			 * neighbour. So our traverse of the freelist will
+			 * skip it and consider the next record in the chain.
+			 *
+			 * But the enlarged left neighbour may be a candidate.
+			 * If it is, we can not directly use it, though.
+			 * The only thing we can do and have to do here is to
+			 * update the current best fit size in the chain if the
+			 * current best fit is the left record. (By that we may
+			 * worsen the best fit we already had, bit this is not a
+			 * problem.)
+			 *
+			 * If the current best fit is not the left record,
+			 * all we can do is remember the fact that a merge
+			 * created a new candidate so that we can trigger
+			 * a second walk of the freelist if at the end of
+			 * the first walk we have not found any fit.
+			 * This way we can avoid expanding the database.
+			 */
+
+			if (bestfit.rec_ptr == left_ptr) {
+				bestfit.rec_len = left_rec.rec_len;
+			}
+
+			if (left_rec.rec_len > length) {
+				merge_created_candidate = true;
+			}
+
+			continue;
+		}
+
 		if (rec->rec_len >= length) {
 			if (bestfit.rec_ptr == 0 ||
 			    rec->rec_len < bestfit.rec_len) {
@@ -517,6 +568,10 @@ static tdb_off_t tdb_allocate_from_freelist(
 		return newrec_ptr;
 	}
 
+	if (merge_created_candidate) {
+		goto again;
+	}
+
 	/* we didn't find enough space. See if we can expand the
 	   database and if we can then try again */
 	if (tdb_expand(tdb, length + sizeof(*rec)) == 0)
-- 
1.9.1

-------------- next part --------------
From 012409fbbe2ea3e74c19c548ca4aede9ebde583b Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 02:51:39 +0200
Subject: [PATCH 01/12] ctdb:vacuum: adapt debug message for repacking.

Now we usually have records to delete == 0 after the preceding
vacuum run. Anyways, deletion is not a major aspect any more
of the repack run and will vanish soon.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 574ad87..fdca2a3 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1480,8 +1480,8 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 		return 0;
 	}
 
-	DEBUG(DEBUG_INFO,("Repacking %s with %u freelist entries and %u records to delete\n", 
-			name, freelist_size, vdata->count.delete_list.left));
+	DEBUG(DEBUG_INFO, ("Repacking %s with %u freelist entries\n",
+			   name, freelist_size));
 
 	/*
 	 * repack and implicitely get rid of the records we can delete
-- 
1.9.1


From 72c157fdb49845463fc9411bb7411978415f12d6 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 02:47:21 +0200
Subject: [PATCH 02/12] ctdb:vacuum: remove a comment in
 ctdb_vacuum_and_repack_db()

The repack operation now mainly defragments the freelist
and does not usually delete any records any more.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index fdca2a3..18ea53d 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1483,9 +1483,6 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	DEBUG(DEBUG_INFO, ("Repacking %s with %u freelist entries\n",
 			   name, freelist_size));
 
-	/*
-	 * repack and implicitely get rid of the records we can delete
-	 */
 	if (ctdb_repack_tdb(ctdb_db->ltdb->tdb, mem_ctx, vdata) != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to repack '%s'\n", name));
 		talloc_free(vdata);
-- 
1.9.1


From 6212ea216a5dd42d299547a7a8838e68b7180613 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 02:53:29 +0200
Subject: [PATCH 03/12] ctdb:vacuum: use plain tdb_repack() instead of
 ctdb_repack_tdb()

Since we usually have 0 records left for repack-deletion,
repacking is essentially used for the purpose of defragmenting
the freelist, we can use the vanilla tdb_repack function.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 18ea53d..ee7bb0b 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1483,7 +1483,7 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	DEBUG(DEBUG_INFO, ("Repacking %s with %u freelist entries\n",
 			   name, freelist_size));
 
-	if (ctdb_repack_tdb(ctdb_db->ltdb->tdb, mem_ctx, vdata) != 0) {
+	if (tdb_repack(ctdb_db->ltdb->tdb) != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to repack '%s'\n", name));
 		talloc_free(vdata);
 		return -1;
-- 
1.9.1


From bae49a2d07e0f9008bcea9156ed9461d139aab3b Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 02:57:00 +0200
Subject: [PATCH 04/12] ctdb:vacuum: remove now unused ctdb_repack_tdb().

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 129 ----------------------------------------------
 1 file changed, 129 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index ee7bb0b..2c78de7 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1295,135 +1295,6 @@ static int ctdb_vacuum_db(struct ctdb_db_context *ctdb_db,
 	return 0;
 }
 
-
-/*
- * traverse function for repacking
- */
-static int repack_traverse(struct tdb_context *tdb, TDB_DATA key, TDB_DATA data,
-			   void *private_data)
-{
-	struct vacuum_data *vdata = (struct vacuum_data *)private_data;
-
-	if (vdata->vacuum) {
-		uint32_t hash = ctdb_hash(&key);
-		struct delete_record_data *kd;
-		/*
-		 * check if we can ignore this record because it's in the delete_list
-		 */
-		kd = (struct delete_record_data *)trbt_lookup32(vdata->delete_list, hash);
-		/*
-		 * there might be hash collisions so we have to compare the keys here to be sure
-		 */
-		if (kd && kd->key.dsize == key.dsize && memcmp(kd->key.dptr, key.dptr, key.dsize) == 0) {
-			struct ctdb_ltdb_header *hdr = (struct ctdb_ltdb_header *)data.dptr;
-			/*
-			 * we have to check if the record hasn't changed in the meantime in order to
-			 * savely remove it from the database
-			 */
-			if (data.dsize == sizeof(struct ctdb_ltdb_header) &&
-				hdr->dmaster == kd->ctdb->pnn &&
-				ctdb_lmaster(kd->ctdb, &(kd->key)) == kd->ctdb->pnn &&
-				kd->hdr.rsn == hdr->rsn) {
-				vdata->count.repack.vacuumed++;
-				return 0;
-			}
-		}
-	}
-	if (tdb_store(vdata->dest_db, key, data, TDB_INSERT) != 0) {
-		vdata->traverse_error = true;
-		return -1;
-	}
-	vdata->count.repack.copied++;
-	return 0;
-}
-
-/*
- * repack a tdb
- */
-static int ctdb_repack_tdb(struct tdb_context *tdb, TALLOC_CTX *mem_ctx, struct vacuum_data *vdata)
-{
-	struct tdb_context *tmp_db;
-
-	if (tdb_transaction_start(tdb) != 0) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to start transaction\n"));
-		return -1;
-	}
-
-	tmp_db = tdb_open("tmpdb", tdb_hash_size(tdb),
-			  TDB_INTERNAL|TDB_DISALLOW_NESTING,
-			  O_RDWR|O_CREAT, 0);
-	if (tmp_db == NULL) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to create tmp_db\n"));
-		tdb_transaction_cancel(tdb);
-		return -1;
-	}
-
-	vdata->traverse_error = false;
-	vdata->dest_db = tmp_db;
-	vdata->vacuum = true;
-	vdata->count.repack.vacuumed = 0;
-	vdata->count.repack.copied = 0;
-
-	/*
-	 * repack and vacuum on-the-fly by not writing the records that are
-	 * no longer needed
-	 */
-	if (tdb_traverse_read(tdb, repack_traverse, vdata) == -1) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to traverse copying out\n"));
-		tdb_transaction_cancel(tdb);
-		tdb_close(tmp_db);
-		return -1;		
-	}
-
-	DEBUG(DEBUG_INFO,(__location__ " %u records vacuumed\n",
-	      vdata->count.repack.vacuumed));
-	
-	if (vdata->traverse_error) {
-		DEBUG(DEBUG_ERR,(__location__ " Error during traversal\n"));
-		tdb_transaction_cancel(tdb);
-		tdb_close(tmp_db);
-		return -1;
-	}
-
-	if (tdb_wipe_all(tdb) != 0) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to wipe database\n"));
-		tdb_transaction_cancel(tdb);
-		tdb_close(tmp_db);
-		return -1;
-	}
-
-	vdata->traverse_error = false;
-	vdata->dest_db = tdb;
-	vdata->vacuum = false;
-	vdata->count.repack.copied = 0;
-
-	if (tdb_traverse_read(tmp_db, repack_traverse, vdata) == -1) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to traverse copying back\n"));
-		tdb_transaction_cancel(tdb);
-		tdb_close(tmp_db);
-		return -1;		
-	}
-
-	if (vdata->traverse_error) {
-		DEBUG(DEBUG_ERR,(__location__ " Error during second traversal\n"));
-		tdb_transaction_cancel(tdb);
-		tdb_close(tmp_db);
-		return -1;
-	}
-
-	tdb_close(tmp_db);
-
-
-	if (tdb_transaction_commit(tdb) != 0) {
-		DEBUG(DEBUG_ERR,(__location__ " Failed to commit\n"));
-		return -1;
-	}
-	DEBUG(DEBUG_INFO,(__location__ " %u records copied\n",
-	      vdata->count.repack.copied));
-
-	return 0;
-}
-
 /*
  * repack and vaccum a db
  * called from the child context
-- 
1.9.1


From 0f701d396613789e266164131cf93f929f7252a3 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 02:57:25 +0200
Subject: [PATCH 05/12] ctdb:vacuum: remove a superfluous and misleading
 comment

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 2c78de7..942d611 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1509,9 +1509,6 @@ ctdb_vacuum_event(struct event_context *ev, struct timed_event *te,
 			_exit(1);
 		}
 
-		/* 
-		 * repack the db
-		 */
 		if ((ctdb->tunable.vacuum_fast_path_count > 0) &&
 		    (vacuum_handle->fast_path_count == 0))
 		{
-- 
1.9.1


From afaa06822cb4f319687115270a242d12db580c45 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 02:59:51 +0200
Subject: [PATCH 06/12] ctdb:vacuum: untangle assignmend and check for return
 of tdb_repack()

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 942d611..64cb135 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1307,6 +1307,7 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	const char *name = ctdb_db->db_name;
 	int freelist_size = 0;
 	struct vacuum_data *vdata;
+	int ret;
 
 	vdata = talloc_zero(mem_ctx, struct vacuum_data);
 	if (vdata == NULL) {
@@ -1354,7 +1355,8 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	DEBUG(DEBUG_INFO, ("Repacking %s with %u freelist entries\n",
 			   name, freelist_size));
 
-	if (tdb_repack(ctdb_db->ltdb->tdb) != 0) {
+	ret = tdb_repack(ctdb_db->ltdb->tdb);
+	if (ret != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to repack '%s'\n", name));
 		talloc_free(vdata);
 		return -1;
-- 
1.9.1


From aef530a7712eb710237d382c33d25c3f6367133e Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 03:02:42 +0200
Subject: [PATCH 07/12] ctdb:vacuum: remove a superfluous comment.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 64cb135..8119acc 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1326,10 +1326,7 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	}
 
 	vdata->start = timeval_current();
- 
-	/*
-	 * gather all records that can be deleted in vdata
-	 */
+
 	if (ctdb_vacuum_db(ctdb_db, vdata, full_vacuum_run) != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to vacuum '%s'\n", name));
 	}
-- 
1.9.1


From aefb0c482e6edee15fe05c65bb2adb24d5dfe5a2 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 03:08:20 +0200
Subject: [PATCH 08/12] ctdb:vacuum: remove vacuum limit from vdata - not used

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 8119acc..cb0d993 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -53,7 +53,6 @@ struct ctdb_vacuum_handle {
 
 /*  a list of records to possibly delete */
 struct vacuum_data {
-	uint32_t repack_limit;
 	struct ctdb_context *ctdb;
 	struct ctdb_db_context *ctdb_db;
 	struct tdb_context *dest_db;
@@ -1316,7 +1315,6 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	}
 
 	vdata->ctdb = ctdb_db->ctdb;
-	vdata->repack_limit = repack_limit;
 	vdata->delete_list = trbt_create(vdata, 0);
 	vdata->ctdb_db = ctdb_db;
 	if (vdata->delete_list == NULL) {
-- 
1.9.1


From bfbf55c731adf8dc3055c6f1ea14036af2c2e20a Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 03:34:05 +0200
Subject: [PATCH 09/12] ctdb:vacuum: move init of vdata into init_vdata funcion

This is a small code cleanup.
vdata is only used in ctdb_vacuum_db() and not in
ctdb_vacuum_and_repack_db() where it is currently initialized.

This patch moves creation and all previously scattered
initialization of vacuum_data into ctdb_vacuum_init_vacuum_data
which is called from ctdb_vacuum_db.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 75 +++++++++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 32 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index cb0d993..46c8c93 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1170,11 +1170,29 @@ done:
 /**
  * initialize the vacuum_data
  */
-static int ctdb_vacuum_init_vacuum_data(struct ctdb_db_context *ctdb_db,
-					struct vacuum_data *vdata)
+static struct vacuum_data *ctdb_vacuum_init_vacuum_data(
+					struct ctdb_db_context *ctdb_db,
+					TALLOC_CTX *mem_ctx)
 {
 	int i;
 	struct ctdb_context *ctdb = ctdb_db->ctdb;
+	struct vacuum_data *vdata;
+
+	vdata = talloc_zero(mem_ctx, struct vacuum_data);
+	if (vdata == NULL) {
+		DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));
+		return NULL;
+	}
+
+	vdata->ctdb = ctdb_db->ctdb;
+	vdata->ctdb_db = ctdb_db;
+	vdata->delete_list = trbt_create(vdata, 0);
+	if (vdata->delete_list == NULL) {
+		DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));
+		goto fail;
+	}
+
+	vdata->start = timeval_current();
 
 	vdata->count.delete_queue.added_to_delete_list = 0;
 	vdata->count.delete_queue.added_to_vacuum_fetch_list = 0;
@@ -1199,7 +1217,7 @@ static int ctdb_vacuum_init_vacuum_data(struct ctdb_db_context *ctdb_db,
 						ctdb->num_nodes);
 	if (vdata->vacuum_fetch_list == NULL) {
 		DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));
-		return -1;
+		goto fail;
 	}
 	for (i = 0; i < ctdb->num_nodes; i++) {
 		vdata->vacuum_fetch_list[i] = (struct ctdb_marshall_buffer *)
@@ -1207,12 +1225,17 @@ static int ctdb_vacuum_init_vacuum_data(struct ctdb_db_context *ctdb_db,
 					 offsetof(struct ctdb_marshall_buffer, data));
 		if (vdata->vacuum_fetch_list[i] == NULL) {
 			DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));
-			return -1;
+			talloc_free(vdata);
+			return NULL;
 		}
 		vdata->vacuum_fetch_list[i]->db_id = ctdb_db->db_id;
 	}
 
-	return 0;
+	return vdata;
+
+fail:
+	talloc_free(vdata);
+	return NULL;
 }
 
 /**
@@ -1248,11 +1271,12 @@ static int ctdb_vacuum_init_vacuum_data(struct ctdb_db_context *ctdb_db,
  * This executes in the child context.
  */
 static int ctdb_vacuum_db(struct ctdb_db_context *ctdb_db,
-			  struct vacuum_data *vdata,
 			  bool full_vacuum_run)
 {
 	struct ctdb_context *ctdb = ctdb_db->ctdb;
 	int ret, pnn;
+	struct vacuum_data *vdata;
+	TALLOC_CTX *tmp_ctx;
 
 	DEBUG(DEBUG_INFO, (__location__ " Entering %s vacuum run for db "
 			   "%s db_id[0x%08x]\n",
@@ -1273,9 +1297,16 @@ static int ctdb_vacuum_db(struct ctdb_db_context *ctdb_db,
 
 	ctdb->pnn = pnn;
 
-	ret = ctdb_vacuum_init_vacuum_data(ctdb_db, vdata);
-	if (ret != 0) {
-		return ret;
+	tmp_ctx = talloc_new(ctdb_db);
+	if (tmp_ctx == NULL) {
+		DEBUG(DEBUG_ERR, ("Out of memory!\n"));
+		return -1;
+	}
+
+	vdata = ctdb_vacuum_init_vacuum_data(ctdb_db, tmp_ctx);
+	if (vdata == NULL) {
+		talloc_free(tmp_ctx);
+		return -1;
 	}
 
 	if (full_vacuum_run) {
@@ -1288,6 +1319,8 @@ static int ctdb_vacuum_db(struct ctdb_db_context *ctdb_db,
 
 	ctdb_process_delete_list(ctdb_db, vdata);
 
+	talloc_free(tmp_ctx);
+
 	/* this ensures we run our event queue */
 	ctdb_ctrl_getpnn(ctdb, TIMELIMIT(), CTDB_CURRENT_NODE);
 
@@ -1305,27 +1338,9 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	uint32_t repack_limit = ctdb_db->ctdb->tunable.repack_limit;
 	const char *name = ctdb_db->db_name;
 	int freelist_size = 0;
-	struct vacuum_data *vdata;
 	int ret;
 
-	vdata = talloc_zero(mem_ctx, struct vacuum_data);
-	if (vdata == NULL) {
-		DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));
-		return -1;
-	}
-
-	vdata->ctdb = ctdb_db->ctdb;
-	vdata->delete_list = trbt_create(vdata, 0);
-	vdata->ctdb_db = ctdb_db;
-	if (vdata->delete_list == NULL) {
-		DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));
-		talloc_free(vdata);
-		return -1;
-	}
-
-	vdata->start = timeval_current();
-
-	if (ctdb_vacuum_db(ctdb_db, vdata, full_vacuum_run) != 0) {
+	if (ctdb_vacuum_db(ctdb_db, full_vacuum_run) != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to vacuum '%s'\n", name));
 	}
 
@@ -1333,7 +1348,6 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 		freelist_size = tdb_freelist_size(ctdb_db->ltdb->tdb);
 		if (freelist_size == -1) {
 			DEBUG(DEBUG_ERR,(__location__ " Failed to get freelist size for '%s'\n", name));
-			talloc_free(vdata);
 			return -1;
 		}
 	}
@@ -1343,7 +1357,6 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	 */
 	if ((repack_limit == 0 || (uint32_t)freelist_size < repack_limit))
 	{
-		talloc_free(vdata);
 		return 0;
 	}
 
@@ -1353,10 +1366,8 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 	ret = tdb_repack(ctdb_db->ltdb->tdb);
 	if (ret != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " Failed to repack '%s'\n", name));
-		talloc_free(vdata);
 		return -1;
 	}
-	talloc_free(vdata);
 
 	return 0;
 }
-- 
1.9.1


From 7284e6c091f52cbe37f4c5795efaf3d63be2855d Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 19 Apr 2014 03:36:49 +0200
Subject: [PATCH 10/12] ctdb:vacuum: remove now unused talloc ctx argument from
 ctdb_vacuum_db()

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 46c8c93..d1cb4de 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1332,7 +1332,6 @@ static int ctdb_vacuum_db(struct ctdb_db_context *ctdb_db,
  * called from the child context
  */
 static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
-				     TALLOC_CTX *mem_ctx,
 				     bool full_vacuum_run)
 {
 	uint32_t repack_limit = ctdb_db->ctdb->tunable.repack_limit;
@@ -1522,8 +1521,7 @@ ctdb_vacuum_event(struct event_context *ev, struct timed_event *te,
 		{
 			full_vacuum_run = true;
 		}
-		cc = ctdb_vacuum_and_repack_db(ctdb_db, child_ctx,
-					       full_vacuum_run);
+		cc = ctdb_vacuum_and_repack_db(ctdb_db, full_vacuum_run);
 
 		write(child_ctx->fd[1], &cc, 1);
 		_exit(0);
-- 
1.9.1


From 753513c0258edc37376973adbcd5214acf5da43c Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Tue, 22 Apr 2014 22:09:35 +0200
Subject: [PATCH 11/12] ctdb:vacuum: add missing return to
 ctdb_vacuum_traverse_db() error path.

This got lost in commit 19948702992c94553e1a611540ad398de9f9d8b9
("ctdb-vacuum: make ctdb_vacuum_traverse_db() void.")

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index d1cb4de..7270331 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -744,6 +744,7 @@ static void ctdb_vacuum_traverse_db(struct ctdb_db_context *ctdb_db,
 	if (ret == -1 || vdata->traverse_error) {
 		DEBUG(DEBUG_ERR, (__location__ " Traverse error in vacuuming "
 				  "'%s'\n", ctdb_db->db_name));
+		return;
 	}
 
 	if (vdata->count.db_traverse.total > 0) {
-- 
1.9.1


From 69941b5812500ff628064698813c94060e707722 Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox at samba.org>
Date: Sat, 15 Feb 2014 01:36:06 +0100
Subject: [PATCH 12/12] ctdb:vacuum: always run freelist_size again

and not only if repack_limit != 0. This partially reverts
commit 48f2d1158820bfb063ba0a0bbfb6f496a8e7522.

With the new tdb code this defragments the
free list by merging adjacent records.

Signed-off-by: Michael Adam <obnox at samba.org>
---
 ctdb/server/ctdb_vacuum.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/ctdb/server/ctdb_vacuum.c b/ctdb/server/ctdb_vacuum.c
index 7270331..ce3c600 100644
--- a/ctdb/server/ctdb_vacuum.c
+++ b/ctdb/server/ctdb_vacuum.c
@@ -1344,12 +1344,10 @@ static int ctdb_vacuum_and_repack_db(struct ctdb_db_context *ctdb_db,
 		DEBUG(DEBUG_ERR,(__location__ " Failed to vacuum '%s'\n", name));
 	}
 
-	if (repack_limit != 0) {
-		freelist_size = tdb_freelist_size(ctdb_db->ltdb->tdb);
-		if (freelist_size == -1) {
-			DEBUG(DEBUG_ERR,(__location__ " Failed to get freelist size for '%s'\n", name));
-			return -1;
-		}
+	freelist_size = tdb_freelist_size(ctdb_db->ltdb->tdb);
+	if (freelist_size == -1) {
+		DEBUG(DEBUG_ERR,(__location__ " Failed to get freelist size for '%s'\n", name));
+		return -1;
 	}
 
 	/*
-- 
1.9.1

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20140616/67e1c9bf/attachment.pgp>


More information about the samba-technical mailing list