[SCM] CTDB repository - branch 1.2 updated - ctdb-1.0.114-367-g68b3a83

Thu Oct 7 20:07:14 MDT 2010

The branch, 1.2 has been updated
       via  68b3a836d4b39453987acc4a69556f6c4a767bda (commit)
       via  9285e21be3a26d1afb5545c5966d1897bbddf6f8 (commit)
       via  a39000f0155789aa690298ddbc4691b9501475cc (commit)
       via  ce27a237bd2e60298ee7187d50e4c041e2a22c38 (commit)
       via  bbecd9237bff1d1ce54c9902131ba2e09713f507 (commit)
       via  a18954cfbe2579b998f01143f60fc57401ef9457 (commit)
       via  fc61340ca7f7da3b680e2eeeb0739bed046e3159 (commit)
       via  79ed337963ec658bfe7e7784f9b143e8ac84c9a0 (commit)
       via  88b490ef9f750e4a10e4e296ef8e904820a43ba5 (commit)
       via  114127afb33b74621562131cffb4af342bec91fa (commit)
       via  0040303a9251e5e2b1243caedc96a0510c08dd55 (commit)
       via  00f7043f7a0d572fff9412a8c66b5c883ef1edba (commit)
       via  56b1b7618c01ce6eb293237c4284aafbc1ac2d6c (commit)
       via  bacc82ae0671ba5ade5e8677359c4dfecfed3c7e (commit)
       via  8036d0c23c1c4bf725d1293591d0f5353cb6f686 (commit)
       via  a2168bbbebdf4bda602c4dcb5c0f7e9707bcbc95 (commit)
       via  c00334fc2ea8250005a48644006a831e85e03d90 (commit)
       via  c53392d5cbb8b0aaf40d6da231e986bc49f66ed2 (commit)
       via  9aff8e281d0c914dce460fc6b1046edc3611e95b (commit)
       via  69f8047019f5b63bef521202980a172649311589 (commit)
       via  61cf1320a25d8549dc5fc9a71087f9276ff7a2aa (commit)
       via  5fe0e7fcee8f27cad2fe321ef503112b093570cb (commit)
       via  30300b3a4b26ad2c8c5cf89ef7645f32878773e0 (commit)
       via  b765f97c81dac6895a14fe21aab12666786cfd5d (commit)
       via  21a4916c1a54e4cdc212e834ca85ae9d37222e73 (commit)
      from  edc4298859e6c00433ab09e0795a470199b15a37 (commit)

http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=shortlog;h=1.2


- Log -----------------------------------------------------------------
commit 68b3a836d4b39453987acc4a69556f6c4a767bda
Author: Jelmer Vernooij <jelmer at samba.org>
Date:   Mon Oct 4 13:17:25 2010 +0200

    pytdb: Add __version__ attribute.

commit 9285e21be3a26d1afb5545c5966d1897bbddf6f8
Author: Jelmer Vernooij <jelmer at samba.org>
Date:   Sat Oct 2 23:40:19 2010 +0200

    pytdb: Include Python.h first to prevent warning.

commit a39000f0155789aa690298ddbc4691b9501475cc
Author: Kirill Smelkov <kirr at landau.phys.spbu.ru>
Date:   Sat Oct 2 17:43:50 2010 +0400

    pytdb: Check errors after PyObject_New() calls
    
    The call could fail with e.g. MemoryError, and we'll dereference NULL
    pointer without checking.
    
    Signed-off-by: Kirill Smelkov <kirr at landau.phys.spbu.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit ce27a237bd2e60298ee7187d50e4c041e2a22c38
Author: Kirill Smelkov <kirr at mns.spb.ru>
Date:   Sat Oct 2 17:43:46 2010 +0400

    pytdb: Add support for tdb_repack()
    
    Cc: 597386 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at landau.phys.spbu.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit bbecd9237bff1d1ce54c9902131ba2e09713f507
Author: Kirill Smelkov <kirr at mns.spb.ru>
Date:   Sat Oct 2 17:43:40 2010 +0400

    pytdb: Add TDB_INCOMPATIBLE_HASH open flag
    
    In 2dcf76 Rusty added TDB_INCOMPATIBLE_HASH open flag which selects
    Jenkins lookup3 hash for new databases.
    
    Expose this flag to python users too.
    
    Cc: Rusty Russell <rusty at rustcorp.com.au>
    Signed-off-by: Kirill Smelkov <kirr at mns.spb.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit a18954cfbe2579b998f01143f60fc57401ef9457
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Sep 27 11:06:51 2010 +0930

    tdb: fix non-WAF build, commit 1.2.6 ABI file.
    
    Sorry Jeremy.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit fc61340ca7f7da3b680e2eeeb0739bed046e3159
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Fri Sep 24 15:45:11 2010 +0930

    tdb: TDB_INCOMPATIBLE_HASH, to allow safe changing of default hash.
    
    This flag to tdb_open/tdb_open_ex effects creation of a new database:
    1) Uses the Jenkins lookup3 hash instead of the old gdbm hash if none is
       specified,
    2) Places a non-zero field in header->rwlocks, so older versions of TDB will
       refuse to open it.
    
    This means that the caller (ie Samba) can set this flag to safely
    change the hash function.  Versions of TDB from this one on will either
    use the correct hash or refuse to open (if a different hash is specified).
    Older TDB versions will see the nonzero rwlocks field and refuse to open
    it under any conditions.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 79ed337963ec658bfe7e7784f9b143e8ac84c9a0
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Fri Sep 24 15:39:43 2010 +0930

    tdb: automatically identify Jenkins hash tdbs
    
    If the caller to tdb_open_ex() doesn't specify a hash, and tdb_old_hash
    doesn't match, try tdb_jenkins_hash.
    
    This was Metze's idea: it makes life simpler, especially with the upcoming
    TDB_INCOMPATIBLE_HASH flag.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 88b490ef9f750e4a10e4e296ef8e904820a43ba5
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Fri Sep 24 15:34:06 2010 +0930

    tdb: add Bob Jenkins lookup3 hash as helper hash.
    
    This is a better hash than the default: shipping it with tdb makes it easy
    for callers to use it as the hash by passing it to tdb_open_ex().
    
    This version taken from CCAN and modified, which took it from
    http://www.burtleburtle.net/bob/c/lookup3.c.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 114127afb33b74621562131cffb4af342bec91fa
Author: Volker Lendecke <vl at samba.org>
Date:   Sat Sep 18 10:56:10 2010 +0400

    tdb: add restore
    
    Based on an idea by Simon McVittie, largely rewritten

commit 0040303a9251e5e2b1243caedc96a0510c08dd55
Author: GÃ¼nther Deschner <gd at samba.org>
Date:   Mon Sep 20 16:01:51 2010 -0700

    lib/tdb: fix c++ build warning in tdb_header_hash().
    
    Guenther

commit 00f7043f7a0d572fff9412a8c66b5c883ef1edba
Author: Jelmer Vernooij <jelmer at samba.org>
Date:   Sun Sep 19 10:42:29 2010 -0700

    pytdb: Make filename argument optional.

commit 56b1b7618c01ce6eb293237c4284aafbc1ac2d6c
Author: Kirill Smelkov <kirr at landau.phys.spbu.ru>
Date:   Sun Sep 19 13:53:29 2010 +0400

    pytdb: Add support for tdb_freelist_size()
    
    Cc: 597386 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at landau.phys.spbu.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit bacc82ae0671ba5ade5e8677359c4dfecfed3c7e
Author: Kirill Smelkov <kirr at landau.phys.spbu.ru>
Date:   Sun Sep 19 13:53:32 2010 +0400

    pytdb: Add support for tdb_transaction_prepare_commit()
    
    Cc: 597386 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at landau.phys.spbu.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit 8036d0c23c1c4bf725d1293591d0f5353cb6f686
Author: Kirill Smelkov <kirr at landau.phys.spbu.ru>
Date:   Sun Sep 19 09:34:33 2010 -0700

    pytdb: Add support for tdb_enable_seqnum, tdb_get_seqnum and tdb_increment_seqnum_nonblock
    
    Cc: 597386 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at landau.phys.spbu.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit a2168bbbebdf4bda602c4dcb5c0f7e9707bcbc95
Author: Kirill Smelkov <kirr at mns.spb.ru>
Date:   Sun Sep 19 13:53:19 2010 +0400

    pytdb: Update open flags to match those for tdb_open() in tdb.h
    
    Namely TDB_NOSYNC, TDB_SEQNUM, TDB_VOLATILE, TDB_ALLOW_NESTING and
    TDB_DISALLOW_NESTING were missing.
    
    Cc: 597386 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at mns.spb.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit c00334fc2ea8250005a48644006a831e85e03d90
Author: Kirill Smelkov <kirr at mns.spb.ru>
Date:   Sun Sep 19 13:53:21 2010 +0400

    pytdb: Fix repr segfault for internal db
    
    The problem was tdb->name is NULL for TDB_INTERNAL databases, and
    so it was crashing ...
    
        #0  0xb76944f3 in strlen () from /lib/i686/cmov/libc.so.6
        #1  0x0809862b in PyString_FromFormatV (format=0xb72b6a26 "Tdb('%s')", vargs=0xbfc26a94 "")
            at ../Objects/stringobject.c:211
        #2  0x08098888 in PyString_FromFormat (format=0xb72b6a26 "Tdb('%s')") at ../Objects/stringobject.c:358
        #3  0xb72b65f2 in tdb_object_repr (self=0xb759e060) at ./pytdb.c:439
    
    Cc: 597089 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at mns.spb.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit c53392d5cbb8b0aaf40d6da231e986bc49f66ed2
Author: Kirill Smelkov <kirr at mns.spb.ru>
Date:   Sun Sep 19 13:53:20 2010 +0400

    pytdb: Add support for tdb_add_flags() & tdb_remove_flags()
    
    Note, unlike tdb_open where flags is `int', tdb_{add,remove}_flags want
    flags as `unsigned', so instead of "i" I used "I" in PyArg_ParseTuple.
    
    Cc: 597386 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at mns.spb.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit 9aff8e281d0c914dce460fc6b1046edc3611e95b
Author: Andrew Tridgell <tridge at samba.org>
Date:   Thu Sep 16 20:06:44 2010 +1000

    tdb: added TDB_NO_FSYNC env variable
    
    this might help reduce test times and load on test machines

commit 69f8047019f5b63bef521202980a172649311589
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Thu Oct 7 15:07:22 2010 +1030

    tdb: increment version to 1.2.4

commit 61cf1320a25d8549dc5fc9a71087f9276ff7a2aa
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Sep 13 20:05:59 2010 +0930

    tdb: put example hashes into header, so we notice incorrect hash_fn.
    
    This is Stefan Metzmacher <metze at samba.org>'s patch with minor changes:
    1) Use the TDB_MAGIC constant so both hashes aren't of strings.
    2) Check the hash in tdb_check (paranoia, really).
    3) Additional check in the (unlikely!) case where both examples hash to 0.
    4) Cosmetic changes to var names and complaint message.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 5fe0e7fcee8f27cad2fe321ef503112b093570cb
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Sep 13 19:59:18 2010 +0930

    tdb: fix tdb_check() on other-endian tdbs.
    
    We must not endian-convert the magic string, just the rest.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 30300b3a4b26ad2c8c5cf89ef7645f32878773e0
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Sep 13 19:58:23 2010 +0930

    tdb: fix tdb_check() on read-only TDBs to actually work.
    
    Commit bc1c82ea137 "Fix tdb_check() to work with read-only tdb databases."
    claimed to do this, but tdb_lockall_read() fails on read-only databases.
    
    Also make sure we can still do tdb_check() inside a transaction (weird,
    but we previously allowed it so don't break the API).
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit b765f97c81dac6895a14fe21aab12666786cfd5d
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Sep 13 19:55:26 2010 +0930

    tdb: make check more robust against recovery failures.
    
    We can end up with dead areas when we die during transaction commit;
    tdb_check() fails on such a (valid) database.
    
    This is particularly noticable now we no longer truncate on recovery;
    if the recovery area was at the end of the file we used to remove it
    that way.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 21a4916c1a54e4cdc212e834ca85ae9d37222e73
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Tue Oct 5 13:06:19 2010 +1030

    idtree: fix right shift of signed ints, crash on large ids on AIX
    
    Right-shifting signed integers in undefined; indeed it seems that on
    AIX with their compiler, doing a 30-bit shift on (INT_MAX-200) gives
    0, not 1 as we might expect.
    
    The obvious fix is to make id and oid unsigned: l (level count) is also
    logically unsigned.
    
    (Note: Samba doesn't generally get to ids > 1 billion, but ctdb does)
    
    Reported-by: Chris Cowan <cc at us.ibm.com>
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>
    
    Autobuild-User: Rusty Russell <rusty at samba.org>
    Autobuild-Date: Wed Oct  6 08:31:09 UTC 2010 on sn-devel-104

-----------------------------------------------------------------------

Summary of changes:
 lib/tdb/ABI/tdb-1.2.6.sigs     |   61 +++++++
 lib/tdb/common/check.c         |   73 +++++++--
 lib/tdb/common/hash.c          |  380 ++++++++++++++++++++++++++++++++++++++++
 lib/tdb/common/open.c          |   95 +++++++++--
 lib/tdb/common/tdb_private.h   |    8 +-
 lib/tdb/config.mk              |    9 +
 lib/tdb/configure.ac           |    2 +-
 lib/tdb/include/tdb.h          |    2 +
 lib/tdb/libtdb.m4              |    2 +-
 lib/tdb/pytdb.c                |  103 +++++++++++-
 lib/tdb/python/tests/simple.py |   46 +++++-
 lib/tdb/tdb.mk                 |    5 +-
 lib/tdb/tools/tdbrestore.c     |  226 ++++++++++++++++++++++++
 lib/util/idtree.c              |    2 +-
 14 files changed, 977 insertions(+), 37 deletions(-)
 create mode 100644 lib/tdb/ABI/tdb-1.2.6.sigs
 create mode 100644 lib/tdb/common/hash.c
 create mode 100644 lib/tdb/tools/tdbrestore.c


Changeset truncated at 500 lines:

diff --git a/lib/tdb/ABI/tdb-1.2.6.sigs b/lib/tdb/ABI/tdb-1.2.6.sigs
new file mode 100644
index 0000000..1e01f3b
--- /dev/null
+++ b/lib/tdb/ABI/tdb-1.2.6.sigs
@@ -0,0 +1,61 @@
+tdb_add_flags: void (struct tdb_context *, unsigned int)
+tdb_append: int (struct tdb_context *, TDB_DATA, TDB_DATA)
+tdb_chainlock: int (struct tdb_context *, TDB_DATA)
+tdb_chainlock_mark: int (struct tdb_context *, TDB_DATA)
+tdb_chainlock_nonblock: int (struct tdb_context *, TDB_DATA)
+tdb_chainlock_read: int (struct tdb_context *, TDB_DATA)
+tdb_chainlock_unmark: int (struct tdb_context *, TDB_DATA)
+tdb_chainunlock: int (struct tdb_context *, TDB_DATA)
+tdb_chainunlock_read: int (struct tdb_context *, TDB_DATA)
+tdb_check: int (struct tdb_context *, int (*)(TDB_DATA, TDB_DATA, void *), void *)
+tdb_close: int (struct tdb_context *)
+tdb_delete: int (struct tdb_context *, TDB_DATA)
+tdb_dump_all: void (struct tdb_context *)
+tdb_enable_seqnum: void (struct tdb_context *)
+tdb_error: enum TDB_ERROR (struct tdb_context *)
+tdb_errorstr: const char *(struct tdb_context *)
+tdb_exists: int (struct tdb_context *, TDB_DATA)
+tdb_fd: int (struct tdb_context *)
+tdb_fetch: TDB_DATA (struct tdb_context *, TDB_DATA)
+tdb_firstkey: TDB_DATA (struct tdb_context *)
+tdb_freelist_size: int (struct tdb_context *)
+tdb_get_flags: int (struct tdb_context *)
+tdb_get_logging_private: void *(struct tdb_context *)
+tdb_get_seqnum: int (struct tdb_context *)
+tdb_hash_size: int (struct tdb_context *)
+tdb_increment_seqnum_nonblock: void (struct tdb_context *)
+tdb_jenkins_hash: unsigned int (TDB_DATA *)
+tdb_lockall: int (struct tdb_context *)
+tdb_lockall_mark: int (struct tdb_context *)
+tdb_lockall_nonblock: int (struct tdb_context *)
+tdb_lockall_read: int (struct tdb_context *)
+tdb_lockall_read_nonblock: int (struct tdb_context *)
+tdb_lockall_unmark: int (struct tdb_context *)
+tdb_log_fn: tdb_log_func (struct tdb_context *)
+tdb_map_size: size_t (struct tdb_context *)
+tdb_name: const char *(struct tdb_context *)
+tdb_nextkey: TDB_DATA (struct tdb_context *, TDB_DATA)
+tdb_null: dptr = 0xXXXX, dsize = 0
+tdb_open: struct tdb_context *(const char *, int, int, int, mode_t)
+tdb_open_ex: struct tdb_context *(const char *, int, int, int, mode_t, const struct tdb_logging_context *, tdb_hash_func)
+tdb_parse_record: int (struct tdb_context *, TDB_DATA, int (*)(TDB_DATA, TDB_DATA, void *), void *)
+tdb_printfreelist: int (struct tdb_context *)
+tdb_remove_flags: void (struct tdb_context *, unsigned int)
+tdb_reopen: int (struct tdb_context *)
+tdb_reopen_all: int (int)
+tdb_repack: int (struct tdb_context *)
+tdb_set_logging_function: void (struct tdb_context *, const struct tdb_logging_context *)
+tdb_set_max_dead: void (struct tdb_context *, int)
+tdb_setalarm_sigptr: void (struct tdb_context *, volatile sig_atomic_t *)
+tdb_store: int (struct tdb_context *, TDB_DATA, TDB_DATA, int)
+tdb_transaction_cancel: int (struct tdb_context *)
+tdb_transaction_commit: int (struct tdb_context *)
+tdb_transaction_prepare_commit: int (struct tdb_context *)
+tdb_transaction_start: int (struct tdb_context *)
+tdb_transaction_start_nonblock: int (struct tdb_context *)
+tdb_traverse: int (struct tdb_context *, tdb_traverse_func, void *)
+tdb_traverse_read: int (struct tdb_context *, tdb_traverse_func, void *)
+tdb_unlockall: int (struct tdb_context *)
+tdb_unlockall_read: int (struct tdb_context *)
+tdb_validate_freelist: int (struct tdb_context *, int *)
+tdb_wipe_all: int (struct tdb_context *)
diff --git a/lib/tdb/common/check.c b/lib/tdb/common/check.c
index 2c64043..58c9c26 100644
--- a/lib/tdb/common/check.c
+++ b/lib/tdb/common/check.c
@@ -28,8 +28,9 @@
 static bool tdb_check_header(struct tdb_context *tdb, tdb_off_t *recovery)
 {
 	struct tdb_header hdr;
+	uint32_t h1, h2;
 
-	if (tdb->methods->tdb_read(tdb, 0, &hdr, sizeof(hdr), DOCONV()) == -1)
+	if (tdb->methods->tdb_read(tdb, 0, &hdr, sizeof(hdr), 0) == -1)
 		return false;
 	if (strcmp(hdr.magic_food, TDB_MAGIC_FOOD) != 0)
 		goto corrupt;
@@ -38,7 +39,12 @@ static bool tdb_check_header(struct tdb_context *tdb, tdb_off_t *recovery)
 	if (hdr.version != TDB_VERSION)
 		goto corrupt;
 
-	if (hdr.rwlocks != 0)
+	if (hdr.rwlocks != 0 && hdr.rwlocks != TDB_HASH_RWLOCK_MAGIC)
+		goto corrupt;
+
+	tdb_header_hash(tdb, &h1, &h2);
+	if (hdr.magic1_hash && hdr.magic2_hash &&
+	    (hdr.magic1_hash != h1 || hdr.magic2_hash != h2))
 		goto corrupt;
 
 	if (hdr.hash_size == 0)
@@ -301,6 +307,21 @@ static bool tdb_check_free_record(struct tdb_context *tdb,
 	return true;
 }
 
+/* Slow, but should be very rare. */
+static size_t dead_space(struct tdb_context *tdb, tdb_off_t off)
+{
+	size_t len;
+
+	for (len = 0; off + len < tdb->map_size; len++) {
+		char c;
+		if (tdb->methods->tdb_read(tdb, off, &c, 1, 0))
+			return 0;
+		if (c != 0 && c != 0x42)
+			break;
+	}
+	return len;
+}
+
 int tdb_check(struct tdb_context *tdb,
 	      int (*check)(TDB_DATA key, TDB_DATA data, void *private_data),
 	      void *private_data)
@@ -310,9 +331,18 @@ int tdb_check(struct tdb_context *tdb,
 	tdb_off_t off, recovery_start;
 	struct tdb_record rec;
 	bool found_recovery = false;
-
-	if (tdb_lockall_read(tdb) == -1)
-		return -1;
+	tdb_len_t dead;
+	bool locked;
+
+	/* Read-only databases use no locking at all: it's best-effort.
+	 * We may have a write lock already, so skip that case too. */
+	if (tdb->read_only || tdb->allrecord_lock.count != 0) {
+		locked = false;
+	} else {
+		if (tdb_lockall_read(tdb) == -1)
+			return -1;
+		locked = true;
+	}
 
 	/* Make sure we know true size of the underlying file. */
 	tdb->methods->tdb_oob(tdb, tdb->map_size + 1, 1);
@@ -369,8 +399,23 @@ int tdb_check(struct tdb_context *tdb,
 			if (!tdb_check_free_record(tdb, off, &rec, hashes))
 				goto free;
 			break;
-		case TDB_RECOVERY_MAGIC:
+		/* If we crash after ftruncate, we can get zeroes or fill. */
 		case TDB_RECOVERY_INVALID_MAGIC:
+		case 0x42424242:
+			if (recovery_start == off) {
+				found_recovery = true;
+				break;
+			}
+			dead = dead_space(tdb, off);
+			if (dead < sizeof(rec))
+				goto corrupt;
+
+			TDB_LOG((tdb, TDB_DEBUG_ERROR,
+				 "Dead space at %d-%d (of %u)\n",
+				 off, off + dead, tdb->map_size));
+			rec.rec_len = dead - sizeof(rec);
+			break;
+		case TDB_RECOVERY_MAGIC:
 			if (recovery_start != off) {
 				TDB_LOG((tdb, TDB_DEBUG_ERROR,
 					 "Unexpected recovery record at offset %d\n",
@@ -379,7 +424,8 @@ int tdb_check(struct tdb_context *tdb,
 			}
 			found_recovery = true;
 			break;
-		default:
+		default: ;
+		corrupt:
 			tdb->ecode = TDB_ERR_CORRUPT;
 			TDB_LOG((tdb, TDB_DEBUG_ERROR,
 				 "Bad magic 0x%x at offset %d\n",
@@ -405,19 +451,22 @@ int tdb_check(struct tdb_context *tdb,
 	/* We must have found recovery area if there was one. */
 	if (recovery_start != 0 && !found_recovery) {
 		TDB_LOG((tdb, TDB_DEBUG_ERROR,
-			 "Expected %s recovery area, got %s\n",
-			 recovery_start ? "a" : "no",
-			 found_recovery ? "one" : "none"));
+			 "Expected a recovery area at %u\n",
+			 recovery_start));
 		goto free;
 	}
 
 	free(hashes);
-	tdb_unlockall_read(tdb);
+	if (locked) {
+		tdb_unlockall_read(tdb);
+	}
 	return 0;
 
 free:
 	free(hashes);
 unlock:
-	tdb_unlockall_read(tdb);
+	if (locked) {
+		tdb_unlockall_read(tdb);
+	}
 	return -1;
 }
diff --git a/lib/tdb/common/hash.c b/lib/tdb/common/hash.c
new file mode 100644
index 0000000..c07297e
--- /dev/null
+++ b/lib/tdb/common/hash.c
@@ -0,0 +1,380 @@
+ /*
+   Unix SMB/CIFS implementation.
+
+   trivial database library
+
+   Copyright (C) Rusty Russell		   2010
+
+     ** NOTE! The following LGPL license applies to the tdb
+     ** library. This does NOT imply that all of Samba is released
+     ** under the LGPL
+
+   This library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 3 of the License, or (at your option) any later version.
+
+   This library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with this library; if not, see <http://www.gnu.org/licenses/>.
+*/
+#include "tdb_private.h"
+
+/* This is based on the hash algorithm from gdbm */
+unsigned int tdb_old_hash(TDB_DATA *key)
+{
+	uint32_t value;	/* Used to compute the hash value.  */
+	uint32_t   i;	/* Used to cycle through random values. */
+
+	/* Set the initial value from the key size. */
+	for (value = 0x238F13AF * key->dsize, i=0; i < key->dsize; i++)
+		value = (value + (key->dptr[i] << (i*5 % 24)));
+
+	return (1103515243 * value + 12345);
+}
+
+#ifndef WORDS_BIGENDIAN
+# define HASH_LITTLE_ENDIAN 1
+# define HASH_BIG_ENDIAN 0
+#else
+# define HASH_LITTLE_ENDIAN 0
+# define HASH_BIG_ENDIAN 1
+#endif
+
+/*
+-------------------------------------------------------------------------------
+lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+
+These are functions for producing 32-bit hashes for hash table lookup.
+hash_word(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+are externally useful functions.  Routines to test the hash are included
+if SELF_TEST is defined.  You can use this free for any purpose.  It's in
+the public domain.  It has no warranty.
+
+You probably want to use hashlittle().  hashlittle() and hashbig()
+hash byte arrays.  hashlittle() is is faster than hashbig() on
+little-endian machines.  Intel and AMD are little-endian machines.
+On second thought, you probably want hashlittle2(), which is identical to
+hashlittle() except it returns two 32-bit hashes for the price of one.
+You could implement hashbig2() if you wanted but I haven't bothered here.
+
+If you want to find a hash of, say, exactly 7 integers, do
+  a = i1;  b = i2;  c = i3;
+  mix(a,b,c);
+  a += i4; b += i5; c += i6;
+  mix(a,b,c);
+  a += i7;
+  final(a,b,c);
+then use c as the hash value.  If you have a variable length array of
+4-byte integers to hash, use hash_word().  If you have a byte array (like
+a character string), use hashlittle().  If you have several byte arrays, or
+a mix of things, see the comments above hashlittle().
+
+Why is this so big?  I read 12 bytes at a time into 3 4-byte integers,
+then mix those integers.  This is fast (you can do a lot more thorough
+mixing with 12*3 instructions on 3 integers than you can with 3 instructions
+on 1 byte), but shoehorning those bytes into integers efficiently is messy.
+*/
+
+#define hashsize(n) ((uint32_t)1<<(n))
+#define hashmask(n) (hashsize(n)-1)
+#define rot(x,k) (((x)<<(k)) | ((x)>>(32-(k))))
+
+/*
+-------------------------------------------------------------------------------
+mix -- mix 3 32-bit values reversibly.
+
+This is reversible, so any information in (a,b,c) before mix() is
+still in (a,b,c) after mix().
+
+If four pairs of (a,b,c) inputs are run through mix(), or through
+mix() in reverse, there are at least 32 bits of the output that
+are sometimes the same for one pair and different for another pair.
+This was tested for:
+* pairs that differed by one bit, by two bits, in any combination
+  of top bits of (a,b,c), or in any combination of bottom bits of
+  (a,b,c).
+* "differ" is defined as +, -, ^, or ~^.  For + and -, I transformed
+  the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
+  is commonly produced by subtraction) look like a single 1-bit
+  difference.
+* the base values were pseudorandom, all zero but one bit set, or
+  all zero plus a counter that starts at zero.
+
+Some k values for my "a-=c; a^=rot(c,k); c+=b;" arrangement that
+satisfy this are
+    4  6  8 16 19  4
+    9 15  3 18 27 15
+   14  9  3  7 17  3
+Well, "9 15 3 18 27 15" didn't quite get 32 bits diffing
+for "differ" defined as + with a one-bit base and a two-bit delta.  I
+used http://burtleburtle.net/bob/hash/avalanche.html to choose
+the operations, constants, and arrangements of the variables.
+
+This does not achieve avalanche.  There are input bits of (a,b,c)
+that fail to affect some output bits of (a,b,c), especially of a.  The
+most thoroughly mixed value is c, but it doesn't really even achieve
+avalanche in c.
+
+This allows some parallelism.  Read-after-writes are good at doubling
+the number of bits affected, so the goal of mixing pulls in the opposite
+direction as the goal of parallelism.  I did what I could.  Rotates
+seem to cost as much as shifts on every machine I could lay my hands
+on, and rotates are much kinder to the top and bottom bits, so I used
+rotates.
+-------------------------------------------------------------------------------
+*/
+#define mix(a,b,c) \
+{ \
+  a -= c;  a ^= rot(c, 4);  c += b; \
+  b -= a;  b ^= rot(a, 6);  a += c; \
+  c -= b;  c ^= rot(b, 8);  b += a; \
+  a -= c;  a ^= rot(c,16);  c += b; \
+  b -= a;  b ^= rot(a,19);  a += c; \
+  c -= b;  c ^= rot(b, 4);  b += a; \
+}
+
+/*
+-------------------------------------------------------------------------------
+final -- final mixing of 3 32-bit values (a,b,c) into c
+
+Pairs of (a,b,c) values differing in only a few bits will usually
+produce values of c that look totally different.  This was tested for
+* pairs that differed by one bit, by two bits, in any combination
+  of top bits of (a,b,c), or in any combination of bottom bits of
+  (a,b,c).
+* "differ" is defined as +, -, ^, or ~^.  For + and -, I transformed
+  the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
+  is commonly produced by subtraction) look like a single 1-bit
+  difference.
+* the base values were pseudorandom, all zero but one bit set, or
+  all zero plus a counter that starts at zero.
+
+These constants passed:
+ 14 11 25 16 4 14 24
+ 12 14 25 16 4 14 24
+and these came close:
+  4  8 15 26 3 22 24
+ 10  8 15 26 3 22 24
+ 11  8 15 26 3 22 24
+-------------------------------------------------------------------------------
+*/
+#define final(a,b,c) \
+{ \
+  c ^= b; c -= rot(b,14); \
+  a ^= c; a -= rot(c,11); \
+  b ^= a; b -= rot(a,25); \
+  c ^= b; c -= rot(b,16); \
+  a ^= c; a -= rot(c,4);  \
+  b ^= a; b -= rot(a,14); \
+  c ^= b; c -= rot(b,24); \
+}
+
+
+/*
+-------------------------------------------------------------------------------
+hashlittle() -- hash a variable-length key into a 32-bit value
+  k       : the key (the unaligned variable-length array of bytes)
+  length  : the length of the key, counting by bytes
+  val2    : IN: can be any 4-byte value OUT: second 32 bit hash.
+Returns a 32-bit value.  Every bit of the key affects every bit of
+the return value.  Two keys differing by one or two bits will have
+totally different hash values.  Note that the return value is better
+mixed than val2, so use that first.
+
+The best hash table sizes are powers of 2.  There is no need to do
+mod a prime (mod is sooo slow!).  If you need less than 32 bits,
+use a bitmask.  For example, if you need only 10 bits, do
+  h = (h & hashmask(10));
+In which case, the hash table should have hashsize(10) elements.
+
+If you are hashing n strings (uint8_t **)k, do it like this:
+  for (i=0, h=0; i<n; ++i) h = hashlittle( k[i], len[i], h);
+
+By Bob Jenkins, 2006.  bob_jenkins at burtleburtle.net.  You may use this
+code any way you wish, private, educational, or commercial.  It's free.
+
+Use for hash table lookup, or anything where one collision in 2^^32 is
+acceptable.  Do NOT use for cryptographic purposes.
+-------------------------------------------------------------------------------
+*/
+
+static uint32_t hashlittle( const void *key, size_t length )
+{
+  uint32_t a,b,c;                                          /* internal state */
+  union { const void *ptr; size_t i; } u;     /* needed for Mac Powerbook G4 */
+
+  /* Set up the internal state */
+  a = b = c = 0xdeadbeef + ((uint32_t)length);
+
+  u.ptr = key;
+  if (HASH_LITTLE_ENDIAN && ((u.i & 0x3) == 0)) {
+    const uint32_t *k = (const uint32_t *)key;         /* read 32-bit chunks */
+#ifdef VALGRIND
+    const uint8_t  *k8;
+#endif
+
+    /*------ all but last block: aligned reads and affect 32 bits of (a,b,c) */
+    while (length > 12)
+    {
+      a += k[0];
+      b += k[1];
+      c += k[2];
+      mix(a,b,c);
+      length -= 12;
+      k += 3;
+    }
+
+    /*----------------------------- handle the last (probably partial) block */
+    /*
+     * "k[2]&0xffffff" actually reads beyond the end of the string, but
+     * then masks off the part it's not allowed to read.  Because the
+     * string is aligned, the masked-off tail is in the same word as the
+     * rest of the string.  Every machine with memory protection I've seen
+     * does it on word boundaries, so is OK with this.  But VALGRIND will
+     * still catch it and complain.  The masking trick does make the hash
+     * noticably faster for short strings (like English words).
+     */
+#ifndef VALGRIND
+
+    switch(length)
+    {
+    case 12: c+=k[2]; b+=k[1]; a+=k[0]; break;
+    case 11: c+=k[2]&0xffffff; b+=k[1]; a+=k[0]; break;
+    case 10: c+=k[2]&0xffff; b+=k[1]; a+=k[0]; break;
+    case 9 : c+=k[2]&0xff; b+=k[1]; a+=k[0]; break;
+    case 8 : b+=k[1]; a+=k[0]; break;
+    case 7 : b+=k[1]&0xffffff; a+=k[0]; break;
+    case 6 : b+=k[1]&0xffff; a+=k[0]; break;
+    case 5 : b+=k[1]&0xff; a+=k[0]; break;
+    case 4 : a+=k[0]; break;
+    case 3 : a+=k[0]&0xffffff; break;
+    case 2 : a+=k[0]&0xffff; break;
+    case 1 : a+=k[0]&0xff; break;
+    case 0 : return c;              /* zero length strings require no mixing */
+    }
+
+#else /* make valgrind happy */
+
+    k8 = (const uint8_t *)k;
+    switch(length)
+    {
+    case 12: c+=k[2]; b+=k[1]; a+=k[0]; break;
+    case 11: c+=((uint32_t)k8[10])<<16;  /* fall through */
+    case 10: c+=((uint32_t)k8[9])<<8;    /* fall through */
+    case 9 : c+=k8[8];                   /* fall through */
+    case 8 : b+=k[1]; a+=k[0]; break;
+    case 7 : b+=((uint32_t)k8[6])<<16;   /* fall through */
+    case 6 : b+=((uint32_t)k8[5])<<8;    /* fall through */
+    case 5 : b+=k8[4];                   /* fall through */
+    case 4 : a+=k[0]; break;
+    case 3 : a+=((uint32_t)k8[2])<<16;   /* fall through */
+    case 2 : a+=((uint32_t)k8[1])<<8;    /* fall through */
+    case 1 : a+=k8[0]; break;
+    case 0 : return c;
+    }
+
+#endif /* !valgrind */
+
+  } else if (HASH_LITTLE_ENDIAN && ((u.i & 0x1) == 0)) {
+    const uint16_t *k = (const uint16_t *)key;         /* read 16-bit chunks */
+    const uint8_t  *k8;
+
+    /*--------------- all but last block: aligned reads and different mixing */
+    while (length > 12)
+    {
+      a += k[0] + (((uint32_t)k[1])<<16);
+      b += k[2] + (((uint32_t)k[3])<<16);


-- 
CTDB repository