[SCM] CTDB repository - branch master updated - ctdb-1.0.87-36-g8f48e37

Ronnie Sahlberg sahlberg at samba.org
Tue Aug 4 01:32:51 MDT 2009


The branch, master has been updated
       via  8f48e37c254e0852d4e2dea54b905ce5ef2b925d (commit)
       via  8d0d432ab7766d9c0f9868fd77e48b9b5cc5d9f9 (commit)
       via  8b6a5bba93843cd83b7b386b82949ad88f29884a (commit)
       via  6de2823f5f7976d4efa20761e518d6b67753f054 (commit)
       via  ce19658ba13272238058e9b9bc03e62f48b737c0 (commit)
       via  e72974e5cefabc7035399d16633f727f868caa61 (commit)
       via  233c52bfb087f636ad61e95c12616c02901f4f83 (commit)
       via  fe3ceb101a5a9c336973c2c6c31406bd8181c2fe (commit)
       via  e03980add02a28609a7a0a0c87ebc85419b98144 (commit)
       via  5253a0ba3a34fbf5810f363ecc094203d49e835f (commit)
       via  aa22d1875b1997664af983c0baeabe34e40dd253 (commit)
       via  4c3dac215a088947f645f727343997f5d47e3260 (commit)
      from  32a69b0efa078b069802470be6488a4efe32961d (commit)

http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 8f48e37c254e0852d4e2dea54b905ce5ef2b925d
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Thu Jul 30 11:52:39 2009 +0930

    tdb: don't alter tdb->flags in tdb_reopen_all()
    
    The flags are user-visible, via tdb_get_flags/add_flags/remove_flags.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>
    Signed-off-by: Stefan Metzmacher <metze at samba.org>

commit 8d0d432ab7766d9c0f9868fd77e48b9b5cc5d9f9
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Thu Jul 30 11:52:08 2009 +0930

    tdb: Reimplementation of Metze's "lib/tdb: if we know pwrite and pread are thread/fork safe tdb_reopen_all() should be a noop".
    
    This version just wraps the reopen code, so we still re-grab the lock and do
    the normal sanity checks.
    
    The reason we do this at all is to avoid global fd limits, see:
    http://forums.fedoraforum.org/showthread.php?t=210393
    
    Note also that this whole reopen concept is fundamentally racy: if the parent
    goes away before the child calls tdb_reopen_all, the database can be left
    without an active lock and another TDB_CLEAR_IF_FIRST opener will clear it.
    A fork_with_tdbs() wrapper could use a pipe to solve this, but it's hardly
    elegant (what if there are other independent things which have similar needs?).
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>
    Signed-off-by: Stefan Metzmacher <metze at samba.org>

commit 8b6a5bba93843cd83b7b386b82949ad88f29884a
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Thu Jul 30 13:10:33 2009 -0700

    realloc() has that horrible overloaded free semantic when size is 0: current code does a free of the old record in this case, then fail.

commit 6de2823f5f7976d4efa20761e518d6b67753f054
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Thu Jul 30 13:09:33 2009 -0700

    If the record is at the end of the database, pretending it has length 1 might take us out-of-bounds. Only pretend to be length 1 for the malloc.

commit ce19658ba13272238058e9b9bc03e62f48b737c0
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 29 14:53:03 2009 +0930

    Port from SAMBA tdb: commit 54a51839ea65aa788b18fce8de0ae4f9ba63e4e7 Author: Rusty Russell <rusty at rustcorp.com.au> Date: Sat Jul 18 15:28:58 2009 +0930
    
    Make tdb transaction lock recursive (samba version)
    
        This patch replaces 6ed27edbcd3ba1893636a8072c8d7a621437daf7 and
        1a416ff13ca7786f2e8d24c66addf00883e9cb12, which fixed the bug where traversals
        inside transactions would release the transaction lock early.
    
        This solution is more general, and solves the more minor symptom that nested
        traversals would also release the transaction lock early.  (It was also suggestd in
        Volker's comment in 6ed27ed).
    
        This patch also applies to ctdb, if the traverse.c part is removed (ctdb's tdb
        code never received the previous two fixes).
    
        Tested using the testsuite from ccan (adapted to the samba code).  Thanks to
        Michael Adam for feedback.
    
        Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>
        Signed-off-by: Michael Adam <obnox at samba.org>
    commit 760104188d0d2ed96ec4a70138e6d0bf86d797ed
    Author: Rusty Russell <rusty at rustcorp.com.au>
    Date:   Tue Jul 21 16:23:35 2009 +0930
    
        tdb: fix locking error
    
        54a51839ea65aa788b18fce8de0ae4f9ba63e4e7 "Make tdb transaction lock
        recursive (samba version)" was broken: I "cleaned it up" and prevented
        it from ever unlocking.
    
        To see the problem:
            $ bin/tdbtorture -s 1248142523
            tdb_brlock failed (fd=3) at offset 8 rw_type=1 lck_type=14 len=1
            tdb_transaction_lock: failed to get transaction lock
            tdb_transaction_start failed: Resource deadlock avoided
    
        My testcase relied on the *count* being correct, which it was.  Fixing that
        now.
    
        Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>
        Signed-off-by: Michael Adam <obnox at samba.org>

commit e72974e5cefabc7035399d16633f727f868caa61
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 29 14:51:34 2009 +0930

    Port from SAMBA tdb: commit a6cc04a20089e8fbcce138c271961c37ddcd6c34 Author: Andrew Tridgell <tridge at samba.org> Date: Mon Jun 1 13:13:07 2009 +1000
    
    overallocate all records by 25%
    
        This greatly reduces the fragmentation of databases where records
        tend to grow slowly by a small amount each time. The case where this
        is most seen is the ldb index records. Adding this overallocation
        reduced the size of the resulting database by more than 20x when
        running a test that adds 10k users.

commit 233c52bfb087f636ad61e95c12616c02901f4f83
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 29 14:51:12 2009 +0930

    Port from SAMBA tdb: commit a386173fa1c7c5bcc11ea9260d84b6c52c154b3d Author: Andrew Tridgell <tridge at samba.org> Date: Mon Jun 1 13:11:39 2009 +1000
    
    auto-repack in transactions that expand the tdb
    
        The idea behind this is to recover from badly fragmented free
        lists. Choosing the point where the file expands is fairly arbitrary,
        but seems to work well.

commit fe3ceb101a5a9c336973c2c6c31406bd8181c2fe
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 29 16:02:51 2009 +0930

    Port from SAMBA ctdb: commit 936d76802f98d04d9743b2ca8eeeaadd4362db51 Author: Andrew Tridgell <tridge at samba.org> Date: Tue Dec 16 14:38:17 2008 +1100
    
    imported the tdb_repack() code from CTDB
    
        The tdb_repack() function repacks a TDB so that it has a single
        freelist entry. The file doesn't shrink, but it does remove all
        freelist fragmentation. This code originated in the CTDB vacuuming
        code, but will now be used in ldb to cope with fragmentation from
        re-indexing

commit e03980add02a28609a7a0a0c87ebc85419b98144
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 29 14:50:39 2009 +0930

    Port from SAMBA tdb: commit 4b4fec65db4e202afa13b2d15867f4d8a54d154e Author: Andrew Tridgell <tridge at samba.org> Date: Thu May 28 16:08:28 2009 +1000
    
    make TDB_NOSYNC affect all the fsync/msync calls in transactions
    
        During a transaction commit tdb normally uses fsync/msync calls to
        make it crash safe. This can be disabled using the TDB_NOSYNC flag,
        but it wasn't disabling all the code paths that caused a fsync/msync.

commit 5253a0ba3a34fbf5810f363ecc094203d49e835f
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 29 14:49:57 2009 +0930

    Port from SAMBA tdb: commit a91bcbccf8a2243dac57cacec6fdfc9907580f69 Author: Jim McDonough <jmcd at samba.org> Date: Thu May 21 16:26:26 2009 -0400
    
    Detect tight loop in tdb_find()

commit aa22d1875b1997664af983c0baeabe34e40dd253
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 29 14:48:42 2009 +0930

    Port from SAMBA tdb: commit 42c0931441ef53a3f977e1334355fa83f05ac184 Author: Tim Prouty <tprouty at samba.org> Date: Tue Mar 31 16:24:07 2009 -0700
    
    tdb: Remove unused variable

commit 4c3dac215a088947f645f727343997f5d47e3260
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Wed Jul 29 14:47:29 2009 +0930

    Port from SAMBA tdb:
    
    commit b90863c0b7b860b006ac49c9396711ff351f777f
    Author: Howard Chu <hyc at highlandsun.com>
    Date:   Tue Mar 31 13:15:54 2009 +1100
    
        Add tdb_transaction_prepare_commit()
    
        Using tdb_transaction_prepare_commit() gives us 2-phase commits. This
        allows us to safely commit across multiple tdb databases at once, with
        reasonable transaction semantics
    
        Signed-off-by: tridge at samba.org

-----------------------------------------------------------------------

Summary of changes:
 lib/tdb/common/freelist.c    |    3 +
 lib/tdb/common/io.c          |    5 +-
 lib/tdb/common/lock.c        |   17 +++-
 lib/tdb/common/open.c        |   32 +++++--
 lib/tdb/common/tdb.c         |  103 +++++++++++++++++++++-
 lib/tdb/common/tdb_private.h |    2 +-
 lib/tdb/common/transaction.c |  201 ++++++++++++++++++++++++++++++------------
 lib/tdb/docs/README          |    8 ++
 lib/tdb/include/tdb.h        |    6 +-
 9 files changed, 297 insertions(+), 80 deletions(-)


Changeset truncated at 500 lines:

diff --git a/lib/tdb/common/freelist.c b/lib/tdb/common/freelist.c
index 2f2a4c3..3bc3965 100644
--- a/lib/tdb/common/freelist.c
+++ b/lib/tdb/common/freelist.c
@@ -284,6 +284,9 @@ tdb_off_t tdb_allocate(struct tdb_context *tdb, tdb_len_t length, struct list_st
 	if (tdb_lock(tdb, -1, F_WRLCK) == -1)
 		return 0;
 
+	/* over-allocate to reduce fragmentation */
+	length *= 1.25;
+
 	/* Extra bytes required for tailer */
 	length += sizeof(tdb_off_t);
 	length = TDB_ALIGN(length, TDB_ALIGNMENT);
diff --git a/lib/tdb/common/io.c b/lib/tdb/common/io.c
index 172ab69..7c5f8a2 100644
--- a/lib/tdb/common/io.c
+++ b/lib/tdb/common/io.c
@@ -381,11 +381,8 @@ unsigned char *tdb_alloc_read(struct tdb_context *tdb, tdb_off_t offset, tdb_len
 	unsigned char *buf;
 
 	/* some systems don't like zero length malloc */
-	if (len == 0) {
-		len = 1;
-	}
 
-	if (!(buf = (unsigned char *)malloc(len))) {
+	if (!(buf = (unsigned char *)malloc(len ? len : 1))) {
 		/* Ensure ecode is set for log fn. */
 		tdb->ecode = TDB_ERR_OOM;
 		TDB_LOG((tdb, TDB_DEBUG_ERROR,"tdb_alloc_read malloc failed len=%d (%s)\n",
diff --git a/lib/tdb/common/lock.c b/lib/tdb/common/lock.c
index f156c0f..2c72ae1 100644
--- a/lib/tdb/common/lock.c
+++ b/lib/tdb/common/lock.c
@@ -301,16 +301,21 @@ int tdb_unlock(struct tdb_context *tdb, int list, int ltype)
  */
 int tdb_transaction_lock(struct tdb_context *tdb, int ltype)
 {
-	if (tdb->have_transaction_lock || tdb->global_lock.count) {
+	if (tdb->global_lock.count) {
+		return 0;
+	}
+	if (tdb->transaction_lock_count > 0) {
+		tdb->transaction_lock_count++;
 		return 0;
 	}
+
 	if (tdb->methods->tdb_brlock(tdb, TRANSACTION_LOCK, ltype, 
 				     F_SETLKW, 0, 1) == -1) {
 		TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_transaction_lock: failed to get transaction lock\n"));
 		tdb->ecode = TDB_ERR_LOCK;
 		return -1;
 	}
-	tdb->have_transaction_lock = 1;
+	tdb->transaction_lock_count++;
 	return 0;
 }
 
@@ -320,12 +325,16 @@ int tdb_transaction_lock(struct tdb_context *tdb, int ltype)
 int tdb_transaction_unlock(struct tdb_context *tdb)
 {
 	int ret;
-	if (!tdb->have_transaction_lock) {
+	if (tdb->global_lock.count) {
+		return 0;
+	}
+	if (tdb->transaction_lock_count > 1) {
+		tdb->transaction_lock_count--;
 		return 0;
 	}
 	ret = tdb->methods->tdb_brlock(tdb, TRANSACTION_LOCK, F_UNLCK, F_SETLKW, 0, 1);
 	if (ret == 0) {
-		tdb->have_transaction_lock = 0;
+		tdb->transaction_lock_count = 0;
 	}
 	return ret;
 }
diff --git a/lib/tdb/common/open.c b/lib/tdb/common/open.c
index b19e4ce..2dcdd4b 100644
--- a/lib/tdb/common/open.c
+++ b/lib/tdb/common/open.c
@@ -405,9 +405,7 @@ void *tdb_get_logging_private(struct tdb_context *tdb)
 	return tdb->log.log_private;
 }
 
-/* reopen a tdb - this can be used after a fork to ensure that we have an independent
-   seek pointer from our parent and to re-establish locks */
-int tdb_reopen(struct tdb_context *tdb)
+static int tdb_reopen_internal(struct tdb_context *tdb, bool active_lock)
 {
 	struct stat st;
 
@@ -425,6 +423,9 @@ int tdb_reopen(struct tdb_context *tdb)
 		goto fail;
 	}
 
+/* If we have real pread & pwrite, we can skip reopen. */
+#if !defined(LIBREPLACE_PREAD_NOT_REPLACED) || \
+	!defined(LIBREPLACE_PWRITE_NOT_REPLACED)
 	if (tdb_munmap(tdb) != 0) {
 		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_reopen: munmap failed (%s)\n", strerror(errno)));
 		goto fail;
@@ -436,11 +437,6 @@ int tdb_reopen(struct tdb_context *tdb)
 		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_reopen: open failed (%s)\n", strerror(errno)));
 		goto fail;
 	}
-	if ((tdb->flags & TDB_CLEAR_IF_FIRST) && 
-	    (tdb->methods->tdb_brlock(tdb, ACTIVE_LOCK, F_RDLCK, F_SETLKW, 0, 1) == -1)) {
-		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_reopen: failed to obtain active lock\n"));
-		goto fail;
-	}
 	if (fstat(tdb->fd, &st) != 0) {
 		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_reopen: fstat failed (%s)\n", strerror(errno)));
 		goto fail;
@@ -450,6 +446,13 @@ int tdb_reopen(struct tdb_context *tdb)
 		goto fail;
 	}
 	tdb_mmap(tdb);
+#endif /* fake pread or pwrite */
+
+	if (active_lock &&
+	    (tdb->methods->tdb_brlock(tdb, ACTIVE_LOCK, F_RDLCK, F_SETLKW, 0, 1) == -1)) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_reopen: failed to obtain active lock\n"));
+		goto fail;
+	}
 
 	return 0;
 
@@ -458,12 +461,21 @@ fail:
 	return -1;
 }
 
+/* reopen a tdb - this can be used after a fork to ensure that we have an independent
+   seek pointer from our parent and to re-establish locks */
+int tdb_reopen(struct tdb_context *tdb)
+{
+	return tdb_reopen_internal(tdb, tdb->flags & TDB_CLEAR_IF_FIRST);
+}
+
 /* reopen all tdb's */
 int tdb_reopen_all(int parent_longlived)
 {
 	struct tdb_context *tdb;
 
 	for (tdb=tdbs; tdb; tdb = tdb->next) {
+		bool active_lock = (tdb->flags & TDB_CLEAR_IF_FIRST);
+
 		/*
 		 * If the parent is longlived (ie. a
 		 * parent daemon architecture), we know
@@ -477,10 +489,10 @@ int tdb_reopen_all(int parent_longlived)
 		 */
 		if (parent_longlived) {
 			/* Ensure no clear-if-first. */
-			tdb->flags &= ~TDB_CLEAR_IF_FIRST;
+			active_lock = false;
 		}
 
-		if (tdb_reopen(tdb) != 0)
+		if (tdb_reopen_internal(tdb, active_lock) != 0)
 			return -1;
 	}
 
diff --git a/lib/tdb/common/tdb.c b/lib/tdb/common/tdb.c
index 767452c..7217003 100644
--- a/lib/tdb/common/tdb.c
+++ b/lib/tdb/common/tdb.c
@@ -96,6 +96,11 @@ static tdb_off_t tdb_find(struct tdb_context *tdb, TDB_DATA key, uint32_t hash,
 				      NULL) == 0) {
 			return rec_ptr;
 		}
+		/* detect tight infinite loop */
+		if (rec_ptr == r->next) {
+			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_find: loop detected.\n"));
+			return TDB_ERRCODE(TDB_ERR_CORRUPT, 0);
+		}
 		rec_ptr = r->next;
 	}
 	return TDB_ERRCODE(TDB_ERR_NOEXIST, 0);
@@ -579,8 +584,13 @@ int tdb_append(struct tdb_context *tdb, TDB_DATA key, TDB_DATA new_dbuf)
 	if (dbuf.dptr == NULL) {
 		dbuf.dptr = (unsigned char *)malloc(new_dbuf.dsize);
 	} else {
-		unsigned char *new_dptr = (unsigned char *)realloc(dbuf.dptr,
-						     dbuf.dsize + new_dbuf.dsize);
+		unsigned int new_len = dbuf.dsize + new_dbuf.dsize;
+		unsigned char *new_dptr;
+
+		/* realloc '0' is special: don't do that. */
+		if (new_len == 0)
+			new_len = 1;
+		new_dptr = (unsigned char *)realloc(dbuf.dptr, new_len);
 		if (new_dptr == NULL) {
 			free(dbuf.dptr);
 		}
@@ -800,3 +810,92 @@ failed:
 	tdb_unlockall(tdb);
 	return -1;
 }
+
+struct traverse_state {
+	bool error;
+	struct tdb_context *dest_db;
+};
+
+/*
+  traverse function for repacking
+ */
+static int repack_traverse(struct tdb_context *tdb, TDB_DATA key, TDB_DATA data, void *private)
+{
+	struct traverse_state *state = (struct traverse_state *)private;
+	if (tdb_store(state->dest_db, key, data, TDB_INSERT) != 0) {
+		state->error = true;
+		return -1;
+	}
+	return 0;
+}
+
+/*
+  repack a tdb
+ */
+int tdb_repack(struct tdb_context *tdb)
+{
+	struct tdb_context *tmp_db;
+	struct traverse_state state;
+
+	if (tdb_transaction_start(tdb) != 0) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, __location__ " Failed to start transaction\n"));
+		return -1;
+	}
+
+	tmp_db = tdb_open("tmpdb", tdb_hash_size(tdb), TDB_INTERNAL, O_RDWR|O_CREAT, 0);
+	if (tmp_db == NULL) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, __location__ " Failed to create tmp_db\n"));
+		tdb_transaction_cancel(tdb);
+		return -1;
+	}
+
+	state.error = false;
+	state.dest_db = tmp_db;
+
+	if (tdb_traverse_read(tdb, repack_traverse, &state) == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, __location__ " Failed to traverse copying out\n"));
+		tdb_transaction_cancel(tdb);
+		tdb_close(tmp_db);
+		return -1;		
+	}
+
+	if (state.error) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, __location__ " Error during traversal\n"));
+		tdb_transaction_cancel(tdb);
+		tdb_close(tmp_db);
+		return -1;
+	}
+
+	if (tdb_wipe_all(tdb) != 0) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, __location__ " Failed to wipe database\n"));
+		tdb_transaction_cancel(tdb);
+		tdb_close(tmp_db);
+		return -1;
+	}
+
+	state.error = false;
+	state.dest_db = tdb;
+
+	if (tdb_traverse_read(tmp_db, repack_traverse, &state) == -1) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, __location__ " Failed to traverse copying back\n"));
+		tdb_transaction_cancel(tdb);
+		tdb_close(tmp_db);
+		return -1;		
+	}
+
+	if (state.error) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, __location__ " Error during second traversal\n"));
+		tdb_transaction_cancel(tdb);
+		tdb_close(tmp_db);
+		return -1;
+	}
+
+	tdb_close(tmp_db);
+
+	if (tdb_transaction_commit(tdb) != 0) {
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, __location__ " Failed to commit\n"));
+		return -1;
+	}
+
+	return 0;
+}
diff --git a/lib/tdb/common/tdb_private.h b/lib/tdb/common/tdb_private.h
index ffac89f..45b85f4 100644
--- a/lib/tdb/common/tdb_private.h
+++ b/lib/tdb/common/tdb_private.h
@@ -166,7 +166,7 @@ struct tdb_context {
 	struct tdb_transaction *transaction;
 	int page_size;
 	int max_dead_records;
-	bool have_transaction_lock;
+	int transaction_lock_count;
 	volatile sig_atomic_t *interrupt_sig_ptr;
 };
 
diff --git a/lib/tdb/common/transaction.c b/lib/tdb/common/transaction.c
index 98e8eff..ee6541f 100644
--- a/lib/tdb/common/transaction.c
+++ b/lib/tdb/common/transaction.c
@@ -123,8 +123,15 @@ struct tdb_transaction {
 	   but don't create a new transaction */
 	int nesting;
 
+	/* set when a prepare has already occurred */
+	bool prepared;
+	tdb_off_t magic_offset;
+
 	/* old file size before transaction */
 	tdb_len_t old_map_size;
+
+	/* we should re-pack on commit */
+	bool need_repack;
 };
 
 
@@ -137,6 +144,14 @@ static int transaction_read(struct tdb_context *tdb, tdb_off_t off, void *buf,
 {
 	uint32_t blk;
 
+	/* Only a commit is allowed on a prepared transaction */
+	if (tdb->transaction->prepared) {
+		tdb->ecode = TDB_ERR_EINVAL;
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, "transaction_read: transaction already prepared, read not allowed\n"));
+		tdb->transaction->transaction_error = 1;
+		return -1;
+	}
+
 	/* break it down into block sized ops */
 	while (len + (off % tdb->transaction->block_size) > tdb->transaction->block_size) {
 		tdb_len_t len2 = tdb->transaction->block_size - (off % tdb->transaction->block_size);
@@ -194,6 +209,14 @@ static int transaction_write(struct tdb_context *tdb, tdb_off_t off,
 {
 	uint32_t blk;
 
+	/* Only a commit is allowed on a prepared transaction */
+	if (tdb->transaction->prepared) {
+		tdb->ecode = TDB_ERR_EINVAL;
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, "transaction_write: transaction already prepared, write not allowed\n"));
+		tdb->transaction->transaction_error = 1;
+		return -1;
+	}
+
 	/* if the write is to a hash head, then update the transaction
 	   hash heads */
 	if (len == sizeof(tdb_off_t) && off >= FREELIST_TOP &&
@@ -379,6 +402,8 @@ static int transaction_expand_file(struct tdb_context *tdb, tdb_off_t size,
 		return -1;
 	}
 
+	tdb->transaction->need_repack = true;
+
 	return 0;
 }
 
@@ -509,11 +534,41 @@ fail:
 
 
 /*
+  sync to disk
+*/
+static int transaction_sync(struct tdb_context *tdb, tdb_off_t offset, tdb_len_t length)
+{	
+	if (tdb->flags & TDB_NOSYNC) {
+		return 0;
+	}
+
+	if (fsync(tdb->fd) != 0) {
+		tdb->ecode = TDB_ERR_IO;
+		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_transaction: fsync failed\n"));
+		return -1;
+	}
+#ifdef MS_SYNC
+	if (tdb->map_ptr) {
+		tdb_off_t moffset = offset & ~(tdb->page_size-1);
+		if (msync(moffset + (char *)tdb->map_ptr, 
+			  length + (offset - moffset), MS_SYNC) != 0) {
+			tdb->ecode = TDB_ERR_IO;
+			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_transaction: msync failed - %s\n",
+				 strerror(errno)));
+			return -1;
+		}
+	}
+#endif
+	return 0;
+}
+
+
+/*
   cancel the current transaction
 */
 int tdb_transaction_cancel(struct tdb_context *tdb)
 {	
-	int i;
+	int i, ret = 0;
 
 	if (tdb->transaction == NULL) {
 		TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_transaction_cancel: no transaction\n"));
@@ -536,6 +591,18 @@ int tdb_transaction_cancel(struct tdb_context *tdb)
 	}
 	SAFE_FREE(tdb->transaction->blocks);
 
+	if (tdb->transaction->magic_offset) {
+		const struct tdb_methods *methods = tdb->transaction->io_methods;
+		uint32_t zero = 0;
+
+		/* remove the recovery marker */
+		if (methods->tdb_write(tdb, tdb->transaction->magic_offset, &zero, 4) == -1 ||
+		transaction_sync(tdb, tdb->transaction->magic_offset, 4) == -1) {
+			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_transaction_cancel: failed to remove recovery magic\n"));
+			ret = -1;
+		}
+	}
+
 	/* remove any global lock created during the transaction */
 	if (tdb->global_lock.count != 0) {
 		tdb_brlock(tdb, FREELIST_TOP, F_UNLCK, F_SETLKW, 0, 4*tdb->header.hash_size);
@@ -561,32 +628,7 @@ int tdb_transaction_cancel(struct tdb_context *tdb)
 	SAFE_FREE(tdb->transaction->hash_heads);
 	SAFE_FREE(tdb->transaction);
 	
-	return 0;
-}
-
-/*
-  sync to disk
-*/
-static int transaction_sync(struct tdb_context *tdb, tdb_off_t offset, tdb_len_t length)
-{	
-	if (fsync(tdb->fd) != 0) {
-		tdb->ecode = TDB_ERR_IO;
-		TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_transaction: fsync failed\n"));
-		return -1;
-	}
-#ifdef MS_SYNC
-	if (tdb->map_ptr) {
-		tdb_off_t moffset = offset & ~(tdb->page_size-1);
-		if (msync(moffset + (char *)tdb->map_ptr, 
-			  length + (offset - moffset), MS_SYNC) != 0) {
-			tdb->ecode = TDB_ERR_IO;
-			TDB_LOG((tdb, TDB_DEBUG_FATAL, "tdb_transaction: msync failed - %s\n",
-				 strerror(errno)));
-			return -1;
-		}
-	}
-#endif
-	return 0;
+	return ret;
 }
 
 
@@ -837,36 +879,38 @@ static int transaction_setup_recovery(struct tdb_context *tdb,
 }
 
 /*
-  commit the current transaction
+  prepare to commit the current transaction
 */
-int tdb_transaction_commit(struct tdb_context *tdb)
+int tdb_transaction_prepare_commit(struct tdb_context *tdb)
 {	
 	const struct tdb_methods *methods;
-	tdb_off_t magic_offset = 0;
-	uint32_t zero = 0;
-	int i;
 
 	if (tdb->transaction == NULL) {
-		TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_transaction_commit: no transaction\n"));
+		TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_transaction_prepare_commit: no transaction\n"));
+		return -1;
+	}
+
+	if (tdb->transaction->prepared) {
+		tdb->ecode = TDB_ERR_EINVAL;
+		tdb_transaction_cancel(tdb);
+		TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_transaction_prepare_commit: transaction already prepared\n"));
 		return -1;
 	}
 
 	if (tdb->transaction->transaction_error) {
 		tdb->ecode = TDB_ERR_IO;
 		tdb_transaction_cancel(tdb);
-		TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_transaction_commit: transaction error pending\n"));
+		TDB_LOG((tdb, TDB_DEBUG_ERROR, "tdb_transaction_prepare_commit: transaction error pending\n"));
 		return -1;
 	}
 
 
 	if (tdb->transaction->nesting != 0) {
-		tdb->transaction->nesting--;
 		return 0;
 	}		
 
 	/* check for a null transaction */


-- 
CTDB repository


More information about the samba-cvs mailing list