[SCM] CTDB repository - branch master updated - ctdb-1.0.114-354-g23510bf

Thu Oct 7 20:07:16 MDT 2010

The branch, master has been updated
       via  23510bf858c06a3710d1cc741d32bad3675fd97e (commit)
       via  90b7e79446f06990f82c0128b2581e15497094fe (commit)
       via  53d49df2d4519c35b270c30660e2504af2a5ed5c (commit)
       via  2b81314eb94d31f4efadd2a3dcf2f6e176338d3f (commit)
       via  247dacde0d0de1358cc2c27d08914be605272023 (commit)
       via  09369aa86e233a58ed131fa5b7584b6c86527d40 (commit)
       via  ce84abcc6be31554da73920280e6bfc5b63b1464 (commit)
       via  07880810941850e81442b888cd70d810d3f80fc3 (commit)
       via  7db9838cb5af0d334efbbcb96bfa51d19b35941a (commit)
       via  dd86b24ae5307fe09d4ae22b7070d747013a2b07 (commit)
       via  3f7ed2b46cb304d553d3f7bd34554d695b8ccc52 (commit)
       via  58c9d90c758aa7c062d84ab97f62947190526356 (commit)
       via  7cda5507f90d7598d745a1acfc66c2afa73cd4b5 (commit)
       via  e34e639c214b010ff18140b769a8c9245c92006f (commit)
       via  3cc73c51caff51e0cba688aefd6f37e632c0e8d4 (commit)
       via  dcdd83e6d6786f0857acdf9aa04bca74a7ccf14d (commit)
       via  fd16bcc1434841d84fdf78f80163c97c0b52b3fe (commit)
       via  1778fd02eec6e64737167c46173c0c76c85cc4d9 (commit)
       via  d0c28ff1fedd27a99a7550fcc74e18cb1f536986 (commit)
       via  3ff413baf04ce28eb54a80141250ae1284b2a521 (commit)
       via  7389f8a8a634c2fe0f068831326d92e6bfa0d046 (commit)
       via  5c4240c364c52073ca64fddf2aa2c1593db0093b (commit)
       via  f1c06608245ec34493c330d891e04c250ad64b20 (commit)
       via  63c582c99128c3623e270e8425966cab7744fb2f (commit)
       via  525390863ad39acea08ceb88531dc59d118fcad4 (commit)
       via  2558eb250011893d09dbeaedaffeefa0e397142f (commit)
       via  b4162a95ff9ae28cda8d9c76c51c9480104517a7 (commit)
      from  25f96db966230e90291eee57841c9faaae33713b (commit)

http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 23510bf858c06a3710d1cc741d32bad3675fd97e
Merge: 90b7e79446f06990f82c0128b2581e15497094fe 53d49df2d4519c35b270c30660e2504af2a5ed5c
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date:   Fri Oct 8 12:49:08 2010 +1100

    Merge commit 'rusty/tdb-update'

commit 90b7e79446f06990f82c0128b2581e15497094fe
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Tue Oct 5 13:06:19 2010 +1030

    idtree: fix right shift of signed ints, crash on large ids on AIX
    
    Right-shifting signed integers in undefined; indeed it seems that on
    AIX with their compiler, doing a 30-bit shift on (INT_MAX-200) gives
    0, not 1 as we might expect.
    
    The obvious fix is to make id and oid unsigned: l (level count) is also
    logically unsigned.
    
    (Note: Samba doesn't generally get to ids > 1 billion, but ctdb does)
    
    Reported-by: Chris Cowan <cc at us.ibm.com>
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>
    
    Autobuild-User: Rusty Russell <rusty at samba.org>
    Autobuild-Date: Wed Oct  6 08:31:09 UTC 2010 on sn-devel-104

commit 53d49df2d4519c35b270c30660e2504af2a5ed5c
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Tue Oct 5 13:06:19 2010 +1030

    idtree: fix right shift of signed ints, crash on large ids on AIX
    
    Right-shifting signed integers in undefined; indeed it seems that on
    AIX with their compiler, doing a 30-bit shift on (INT_MAX-200) gives
    0, not 1 as we might expect.
    
    The obvious fix is to make id and oid unsigned: l (level count) is also
    logically unsigned.
    
    (Note: Samba doesn't generally get to ids > 1 billion, but ctdb does)
    
    Reported-by: Chris Cowan <cc at us.ibm.com>
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>
    
    Autobuild-User: Rusty Russell <rusty at samba.org>
    Autobuild-Date: Wed Oct  6 08:31:09 UTC 2010 on sn-devel-104

commit 2b81314eb94d31f4efadd2a3dcf2f6e176338d3f
Author: Jelmer Vernooij <jelmer at samba.org>
Date:   Mon Oct 4 13:17:25 2010 +0200

    pytdb: Add __version__ attribute.

commit 247dacde0d0de1358cc2c27d08914be605272023
Author: Jelmer Vernooij <jelmer at samba.org>
Date:   Sat Oct 2 23:40:19 2010 +0200

    pytdb: Include Python.h first to prevent warning.

commit 09369aa86e233a58ed131fa5b7584b6c86527d40
Author: Kirill Smelkov <kirr at landau.phys.spbu.ru>
Date:   Sat Oct 2 17:43:50 2010 +0400

    pytdb: Check errors after PyObject_New() calls
    
    The call could fail with e.g. MemoryError, and we'll dereference NULL
    pointer without checking.
    
    Signed-off-by: Kirill Smelkov <kirr at landau.phys.spbu.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit ce84abcc6be31554da73920280e6bfc5b63b1464
Author: Kirill Smelkov <kirr at mns.spb.ru>
Date:   Sat Oct 2 17:43:46 2010 +0400

    pytdb: Add support for tdb_repack()
    
    Cc: 597386 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at landau.phys.spbu.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit 07880810941850e81442b888cd70d810d3f80fc3
Author: Kirill Smelkov <kirr at mns.spb.ru>
Date:   Sat Oct 2 17:43:40 2010 +0400

    pytdb: Add TDB_INCOMPATIBLE_HASH open flag
    
    In 2dcf76 Rusty added TDB_INCOMPATIBLE_HASH open flag which selects
    Jenkins lookup3 hash for new databases.
    
    Expose this flag to python users too.
    
    Cc: Rusty Russell <rusty at rustcorp.com.au>
    Signed-off-by: Kirill Smelkov <kirr at mns.spb.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit 7db9838cb5af0d334efbbcb96bfa51d19b35941a
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Sep 27 11:06:51 2010 +0930

    tdb: fix non-WAF build, commit 1.2.6 ABI file.
    
    Sorry Jeremy.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit dd86b24ae5307fe09d4ae22b7070d747013a2b07
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Fri Sep 24 15:45:11 2010 +0930

    tdb: TDB_INCOMPATIBLE_HASH, to allow safe changing of default hash.
    
    This flag to tdb_open/tdb_open_ex effects creation of a new database:
    1) Uses the Jenkins lookup3 hash instead of the old gdbm hash if none is
       specified,
    2) Places a non-zero field in header->rwlocks, so older versions of TDB will
       refuse to open it.
    
    This means that the caller (ie Samba) can set this flag to safely
    change the hash function.  Versions of TDB from this one on will either
    use the correct hash or refuse to open (if a different hash is specified).
    Older TDB versions will see the nonzero rwlocks field and refuse to open
    it under any conditions.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 3f7ed2b46cb304d553d3f7bd34554d695b8ccc52
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Fri Sep 24 15:39:43 2010 +0930

    tdb: automatically identify Jenkins hash tdbs
    
    If the caller to tdb_open_ex() doesn't specify a hash, and tdb_old_hash
    doesn't match, try tdb_jenkins_hash.
    
    This was Metze's idea: it makes life simpler, especially with the upcoming
    TDB_INCOMPATIBLE_HASH flag.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 58c9d90c758aa7c062d84ab97f62947190526356
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Fri Sep 24 15:34:06 2010 +0930

    tdb: add Bob Jenkins lookup3 hash as helper hash.
    
    This is a better hash than the default: shipping it with tdb makes it easy
    for callers to use it as the hash by passing it to tdb_open_ex().
    
    This version taken from CCAN and modified, which took it from
    http://www.burtleburtle.net/bob/c/lookup3.c.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 7cda5507f90d7598d745a1acfc66c2afa73cd4b5
Author: Volker Lendecke <vl at samba.org>
Date:   Sat Sep 18 10:56:10 2010 +0400

    tdb: add restore
    
    Based on an idea by Simon McVittie, largely rewritten

commit e34e639c214b010ff18140b769a8c9245c92006f
Author: GÃ¼nther Deschner <gd at samba.org>
Date:   Mon Sep 20 16:01:51 2010 -0700

    lib/tdb: fix c++ build warning in tdb_header_hash().
    
    Guenther

commit 3cc73c51caff51e0cba688aefd6f37e632c0e8d4
Author: Jelmer Vernooij <jelmer at samba.org>
Date:   Sun Sep 19 10:42:29 2010 -0700

    pytdb: Make filename argument optional.

commit dcdd83e6d6786f0857acdf9aa04bca74a7ccf14d
Author: Kirill Smelkov <kirr at landau.phys.spbu.ru>
Date:   Sun Sep 19 13:53:29 2010 +0400

    pytdb: Add support for tdb_freelist_size()
    
    Cc: 597386 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at landau.phys.spbu.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit fd16bcc1434841d84fdf78f80163c97c0b52b3fe
Author: Kirill Smelkov <kirr at landau.phys.spbu.ru>
Date:   Sun Sep 19 13:53:32 2010 +0400

    pytdb: Add support for tdb_transaction_prepare_commit()
    
    Cc: 597386 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at landau.phys.spbu.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit 1778fd02eec6e64737167c46173c0c76c85cc4d9
Author: Kirill Smelkov <kirr at landau.phys.spbu.ru>
Date:   Sun Sep 19 09:34:33 2010 -0700

    pytdb: Add support for tdb_enable_seqnum, tdb_get_seqnum and tdb_increment_seqnum_nonblock
    
    Cc: 597386 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at landau.phys.spbu.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit d0c28ff1fedd27a99a7550fcc74e18cb1f536986
Author: Kirill Smelkov <kirr at mns.spb.ru>
Date:   Sun Sep 19 13:53:19 2010 +0400

    pytdb: Update open flags to match those for tdb_open() in tdb.h
    
    Namely TDB_NOSYNC, TDB_SEQNUM, TDB_VOLATILE, TDB_ALLOW_NESTING and
    TDB_DISALLOW_NESTING were missing.
    
    Cc: 597386 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at mns.spb.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit 3ff413baf04ce28eb54a80141250ae1284b2a521
Author: Kirill Smelkov <kirr at mns.spb.ru>
Date:   Sun Sep 19 13:53:21 2010 +0400

    pytdb: Fix repr segfault for internal db
    
    The problem was tdb->name is NULL for TDB_INTERNAL databases, and
    so it was crashing ...
    
        #0  0xb76944f3 in strlen () from /lib/i686/cmov/libc.so.6
        #1  0x0809862b in PyString_FromFormatV (format=0xb72b6a26 "Tdb('%s')", vargs=0xbfc26a94 "")
            at ../Objects/stringobject.c:211
        #2  0x08098888 in PyString_FromFormat (format=0xb72b6a26 "Tdb('%s')") at ../Objects/stringobject.c:358
        #3  0xb72b65f2 in tdb_object_repr (self=0xb759e060) at ./pytdb.c:439
    
    Cc: 597089 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at mns.spb.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit 7389f8a8a634c2fe0f068831326d92e6bfa0d046
Author: Kirill Smelkov <kirr at mns.spb.ru>
Date:   Sun Sep 19 13:53:20 2010 +0400

    pytdb: Add support for tdb_add_flags() & tdb_remove_flags()
    
    Note, unlike tdb_open where flags is `int', tdb_{add,remove}_flags want
    flags as `unsigned', so instead of "i" I used "I" in PyArg_ParseTuple.
    
    Cc: 597386 at bugs.debian.org
    Signed-off-by: Kirill Smelkov <kirr at mns.spb.ru>
    Signed-off-by: Jelmer Vernooij <jelmer at samba.org>

commit 5c4240c364c52073ca64fddf2aa2c1593db0093b
Author: Andrew Tridgell <tridge at samba.org>
Date:   Thu Sep 16 20:06:44 2010 +1000

    tdb: added TDB_NO_FSYNC env variable
    
    this might help reduce test times and load on test machines

commit f1c06608245ec34493c330d891e04c250ad64b20
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Thu Oct 7 15:07:22 2010 +1030

    tdb: increment version to 1.2.4

commit 63c582c99128c3623e270e8425966cab7744fb2f
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Sep 13 20:05:59 2010 +0930

    tdb: put example hashes into header, so we notice incorrect hash_fn.
    
    This is Stefan Metzmacher <metze at samba.org>'s patch with minor changes:
    1) Use the TDB_MAGIC constant so both hashes aren't of strings.
    2) Check the hash in tdb_check (paranoia, really).
    3) Additional check in the (unlikely!) case where both examples hash to 0.
    4) Cosmetic changes to var names and complaint message.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 525390863ad39acea08ceb88531dc59d118fcad4
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Sep 13 19:59:18 2010 +0930

    tdb: fix tdb_check() on other-endian tdbs.
    
    We must not endian-convert the magic string, just the rest.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit 2558eb250011893d09dbeaedaffeefa0e397142f
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Sep 13 19:58:23 2010 +0930

    tdb: fix tdb_check() on read-only TDBs to actually work.
    
    Commit bc1c82ea137 "Fix tdb_check() to work with read-only tdb databases."
    claimed to do this, but tdb_lockall_read() fails on read-only databases.
    
    Also make sure we can still do tdb_check() inside a transaction (weird,
    but we previously allowed it so don't break the API).
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

commit b4162a95ff9ae28cda8d9c76c51c9480104517a7
Author: Rusty Russell <rusty at rustcorp.com.au>
Date:   Mon Sep 13 19:55:26 2010 +0930

    tdb: make check more robust against recovery failures.
    
    We can end up with dead areas when we die during transaction commit;
    tdb_check() fails on such a (valid) database.
    
    This is particularly noticable now we no longer truncate on recovery;
    if the recovery area was at the end of the file we used to remove it
    that way.
    
    Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>

-----------------------------------------------------------------------

Summary of changes:
 lib/tdb/ABI/tdb-1.2.6.sigs     |   61 +++++++
 lib/tdb/common/check.c         |   73 +++++++--
 lib/tdb/common/hash.c          |  380 ++++++++++++++++++++++++++++++++++++++++
 lib/tdb/common/open.c          |   95 +++++++++--
 lib/tdb/common/tdb_private.h   |    8 +-
 lib/tdb/config.mk              |    9 +
 lib/tdb/configure.ac           |    2 +-
 lib/tdb/include/tdb.h          |    2 +
 lib/tdb/libtdb.m4              |    2 +-
 lib/tdb/pytdb.c                |  103 +++++++++++-
 lib/tdb/python/tests/simple.py |   46 +++++-
 lib/tdb/tdb.mk                 |    5 +-
 lib/tdb/tools/tdbrestore.c     |  226 ++++++++++++++++++++++++
 lib/util/idtree.c              |    2 +-
 14 files changed, 977 insertions(+), 37 deletions(-)
 create mode 100644 lib/tdb/ABI/tdb-1.2.6.sigs
 create mode 100644 lib/tdb/common/hash.c
 create mode 100644 lib/tdb/tools/tdbrestore.c


Changeset truncated at 500 lines:

diff --git a/lib/tdb/ABI/tdb-1.2.6.sigs b/lib/tdb/ABI/tdb-1.2.6.sigs
new file mode 100644
index 0000000..1e01f3b
--- /dev/null
+++ b/lib/tdb/ABI/tdb-1.2.6.sigs
@@ -0,0 +1,61 @@
+tdb_add_flags: void (struct tdb_context *, unsigned int)
+tdb_append: int (struct tdb_context *, TDB_DATA, TDB_DATA)
+tdb_chainlock: int (struct tdb_context *, TDB_DATA)
+tdb_chainlock_mark: int (struct tdb_context *, TDB_DATA)
+tdb_chainlock_nonblock: int (struct tdb_context *, TDB_DATA)
+tdb_chainlock_read: int (struct tdb_context *, TDB_DATA)
+tdb_chainlock_unmark: int (struct tdb_context *, TDB_DATA)
+tdb_chainunlock: int (struct tdb_context *, TDB_DATA)
+tdb_chainunlock_read: int (struct tdb_context *, TDB_DATA)
+tdb_check: int (struct tdb_context *, int (*)(TDB_DATA, TDB_DATA, void *), void *)
+tdb_close: int (struct tdb_context *)
+tdb_delete: int (struct tdb_context *, TDB_DATA)
+tdb_dump_all: void (struct tdb_context *)
+tdb_enable_seqnum: void (struct tdb_context *)
+tdb_error: enum TDB_ERROR (struct tdb_context *)
+tdb_errorstr: const char *(struct tdb_context *)
+tdb_exists: int (struct tdb_context *, TDB_DATA)
+tdb_fd: int (struct tdb_context *)
+tdb_fetch: TDB_DATA (struct tdb_context *, TDB_DATA)
+tdb_firstkey: TDB_DATA (struct tdb_context *)
+tdb_freelist_size: int (struct tdb_context *)
+tdb_get_flags: int (struct tdb_context *)
+tdb_get_logging_private: void *(struct tdb_context *)
+tdb_get_seqnum: int (struct tdb_context *)
+tdb_hash_size: int (struct tdb_context *)
+tdb_increment_seqnum_nonblock: void (struct tdb_context *)
+tdb_jenkins_hash: unsigned int (TDB_DATA *)
+tdb_lockall: int (struct tdb_context *)
+tdb_lockall_mark: int (struct tdb_context *)
+tdb_lockall_nonblock: int (struct tdb_context *)
+tdb_lockall_read: int (struct tdb_context *)
+tdb_lockall_read_nonblock: int (struct tdb_context *)
+tdb_lockall_unmark: int (struct tdb_context *)
+tdb_log_fn: tdb_log_func (struct tdb_context *)
+tdb_map_size: size_t (struct tdb_context *)
+tdb_name: const char *(struct tdb_context *)
+tdb_nextkey: TDB_DATA (struct tdb_context *, TDB_DATA)
+tdb_null: dptr = 0xXXXX, dsize = 0
+tdb_open: struct tdb_context *(const char *, int, int, int, mode_t)
+tdb_open_ex: struct tdb_context *(const char *, int, int, int, mode_t, const struct tdb_logging_context *, tdb_hash_func)
+tdb_parse_record: int (struct tdb_context *, TDB_DATA, int (*)(TDB_DATA, TDB_DATA, void *), void *)
+tdb_printfreelist: int (struct tdb_context *)
+tdb_remove_flags: void (struct tdb_context *, unsigned int)
+tdb_reopen: int (struct tdb_context *)
+tdb_reopen_all: int (int)
+tdb_repack: int (struct tdb_context *)
+tdb_set_logging_function: void (struct tdb_context *, const struct tdb_logging_context *)
+tdb_set_max_dead: void (struct tdb_context *, int)
+tdb_setalarm_sigptr: void (struct tdb_context *, volatile sig_atomic_t *)
+tdb_store: int (struct tdb_context *, TDB_DATA, TDB_DATA, int)
+tdb_transaction_cancel: int (struct tdb_context *)
+tdb_transaction_commit: int (struct tdb_context *)
+tdb_transaction_prepare_commit: int (struct tdb_context *)
+tdb_transaction_start: int (struct tdb_context *)
+tdb_transaction_start_nonblock: int (struct tdb_context *)
+tdb_traverse: int (struct tdb_context *, tdb_traverse_func, void *)
+tdb_traverse_read: int (struct tdb_context *, tdb_traverse_func, void *)
+tdb_unlockall: int (struct tdb_context *)
+tdb_unlockall_read: int (struct tdb_context *)
+tdb_validate_freelist: int (struct tdb_context *, int *)
+tdb_wipe_all: int (struct tdb_context *)
diff --git a/lib/tdb/common/check.c b/lib/tdb/common/check.c
index 2c64043..58c9c26 100644
--- a/lib/tdb/common/check.c
+++ b/lib/tdb/common/check.c
@@ -28,8 +28,9 @@
 static bool tdb_check_header(struct tdb_context *tdb, tdb_off_t *recovery)
 {
 	struct tdb_header hdr;
+	uint32_t h1, h2;
 
-	if (tdb->methods->tdb_read(tdb, 0, &hdr, sizeof(hdr), DOCONV()) == -1)
+	if (tdb->methods->tdb_read(tdb, 0, &hdr, sizeof(hdr), 0) == -1)
 		return false;
 	if (strcmp(hdr.magic_food, TDB_MAGIC_FOOD) != 0)
 		goto corrupt;
@@ -38,7 +39,12 @@ static bool tdb_check_header(struct tdb_context *tdb, tdb_off_t *recovery)
 	if (hdr.version != TDB_VERSION)
 		goto corrupt;
 
-	if (hdr.rwlocks != 0)
+	if (hdr.rwlocks != 0 && hdr.rwlocks != TDB_HASH_RWLOCK_MAGIC)
+		goto corrupt;
+
+	tdb_header_hash(tdb, &h1, &h2);
+	if (hdr.magic1_hash && hdr.magic2_hash &&
+	    (hdr.magic1_hash != h1 || hdr.magic2_hash != h2))
 		goto corrupt;
 
 	if (hdr.hash_size == 0)
@@ -301,6 +307,21 @@ static bool tdb_check_free_record(struct tdb_context *tdb,
 	return true;
 }
 
+/* Slow, but should be very rare. */
+static size_t dead_space(struct tdb_context *tdb, tdb_off_t off)
+{
+	size_t len;
+
+	for (len = 0; off + len < tdb->map_size; len++) {
+		char c;
+		if (tdb->methods->tdb_read(tdb, off, &c, 1, 0))
+			return 0;
+		if (c != 0 && c != 0x42)
+			break;
+	}
+	return len;
+}
+
 int tdb_check(struct tdb_context *tdb,
 	      int (*check)(TDB_DATA key, TDB_DATA data, void *private_data),
 	      void *private_data)
@@ -310,9 +331,18 @@ int tdb_check(struct tdb_context *tdb,
 	tdb_off_t off, recovery_start;
 	struct tdb_record rec;
 	bool found_recovery = false;
-
-	if (tdb_lockall_read(tdb) == -1)
-		return -1;
+	tdb_len_t dead;
+	bool locked;
+
+	/* Read-only databases use no locking at all: it's best-effort.
+	 * We may have a write lock already, so skip that case too. */
+	if (tdb->read_only || tdb->allrecord_lock.count != 0) {
+		locked = false;
+	} else {
+		if (tdb_lockall_read(tdb) == -1)
+			return -1;
+		locked = true;
+	}
 
 	/* Make sure we know true size of the underlying file. */
 	tdb->methods->tdb_oob(tdb, tdb->map_size + 1, 1);
@@ -369,8 +399,23 @@ int tdb_check(struct tdb_context *tdb,
 			if (!tdb_check_free_record(tdb, off, &rec, hashes))
 				goto free;
 			break;
-		case TDB_RECOVERY_MAGIC:
+		/* If we crash after ftruncate, we can get zeroes or fill. */
 		case TDB_RECOVERY_INVALID_MAGIC:
+		case 0x42424242:
+			if (recovery_start == off) {
+				found_recovery = true;
+				break;
+			}
+			dead = dead_space(tdb, off);
+			if (dead < sizeof(rec))
+				goto corrupt;
+
+			TDB_LOG((tdb, TDB_DEBUG_ERROR,
+				 "Dead space at %d-%d (of %u)\n",
+				 off, off + dead, tdb->map_size));
+			rec.rec_len = dead - sizeof(rec);
+			break;
+		case TDB_RECOVERY_MAGIC:
 			if (recovery_start != off) {
 				TDB_LOG((tdb, TDB_DEBUG_ERROR,
 					 "Unexpected recovery record at offset %d\n",
@@ -379,7 +424,8 @@ int tdb_check(struct tdb_context *tdb,
 			}
 			found_recovery = true;
 			break;
-		default:
+		default: ;
+		corrupt:
 			tdb->ecode = TDB_ERR_CORRUPT;
 			TDB_LOG((tdb, TDB_DEBUG_ERROR,
 				 "Bad magic 0x%x at offset %d\n",
@@ -405,19 +451,22 @@ int tdb_check(struct tdb_context *tdb,
 	/* We must have found recovery area if there was one. */
 	if (recovery_start != 0 && !found_recovery) {
 		TDB_LOG((tdb, TDB_DEBUG_ERROR,
-			 "Expected %s recovery area, got %s\n",
-			 recovery_start ? "a" : "no",
-			 found_recovery ? "one" : "none"));
+			 "Expected a recovery area at %u\n",
+			 recovery_start));
 		goto free;
 	}
 
 	free(hashes);
-	tdb_unlockall_read(tdb);
+	if (locked) {
+		tdb_unlockall_read(tdb);
+	}
 	return 0;
 
 free:
 	free(hashes);
 unlock:
-	tdb_unlockall_read(tdb);
+	if (locked) {
+		tdb_unlockall_read(tdb);
+	}
 	return -1;
 }
diff --git a/lib/tdb/common/hash.c b/lib/tdb/common/hash.c
new file mode 100644
index 0000000..c07297e
--- /dev/null
+++ b/lib/tdb/common/hash.c
@@ -0,0 +1,380 @@
+ /*
+   Unix SMB/CIFS implementation.
+
+   trivial database library
+
+   Copyright (C) Rusty Russell		   2010
+
+     ** NOTE! The following LGPL license applies to the tdb
+     ** library. This does NOT imply that all of Samba is released
+     ** under the LGPL
+
+   This library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 3 of the License, or (at your option) any later version.
+
+   This library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with this library; if not, see <http://www.gnu.org/licenses/>.
+*/
+#include "tdb_private.h"
+
+/* This is based on the hash algorithm from gdbm */
+unsigned int tdb_old_hash(TDB_DATA *key)
+{
+	uint32_t value;	/* Used to compute the hash value.  */
+	uint32_t   i;	/* Used to cycle through random values. */
+
+	/* Set the initial value from the key size. */
+	for (value = 0x238F13AF * key->dsize, i=0; i < key->dsize; i++)
+		value = (value + (key->dptr[i] << (i*5 % 24)));
+
+	return (1103515243 * value + 12345);
+}
+
+#ifndef WORDS_BIGENDIAN
+# define HASH_LITTLE_ENDIAN 1
+# define HASH_BIG_ENDIAN 0
+#else
+# define HASH_LITTLE_ENDIAN 0
+# define HASH_BIG_ENDIAN 1
+#endif
+
+/*
+-------------------------------------------------------------------------------
+lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+
+These are functions for producing 32-bit hashes for hash table lookup.
+hash_word(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+are externally useful functions.  Routines to test the hash are included
+if SELF_TEST is defined.  You can use this free for any purpose.  It's in
+the public domain.  It has no warranty.
+
+You probably want to use hashlittle().  hashlittle() and hashbig()
+hash byte arrays.  hashlittle() is is faster than hashbig() on
+little-endian machines.  Intel and AMD are little-endian machines.
+On second thought, you probably want hashlittle2(), which is identical to
+hashlittle() except it returns two 32-bit hashes for the price of one.
+You could implement hashbig2() if you wanted but I haven't bothered here.
+
+If you want to find a hash of, say, exactly 7 integers, do
+  a = i1;  b = i2;  c = i3;
+  mix(a,b,c);
+  a += i4; b += i5; c += i6;
+  mix(a,b,c);
+  a += i7;
+  final(a,b,c);
+then use c as the hash value.  If you have a variable length array of
+4-byte integers to hash, use hash_word().  If you have a byte array (like
+a character string), use hashlittle().  If you have several byte arrays, or
+a mix of things, see the comments above hashlittle().
+
+Why is this so big?  I read 12 bytes at a time into 3 4-byte integers,
+then mix those integers.  This is fast (you can do a lot more thorough
+mixing with 12*3 instructions on 3 integers than you can with 3 instructions
+on 1 byte), but shoehorning those bytes into integers efficiently is messy.
+*/
+
+#define hashsize(n) ((uint32_t)1<<(n))
+#define hashmask(n) (hashsize(n)-1)
+#define rot(x,k) (((x)<<(k)) | ((x)>>(32-(k))))
+
+/*
+-------------------------------------------------------------------------------
+mix -- mix 3 32-bit values reversibly.
+
+This is reversible, so any information in (a,b,c) before mix() is
+still in (a,b,c) after mix().
+
+If four pairs of (a,b,c) inputs are run through mix(), or through
+mix() in reverse, there are at least 32 bits of the output that
+are sometimes the same for one pair and different for another pair.
+This was tested for:
+* pairs that differed by one bit, by two bits, in any combination
+  of top bits of (a,b,c), or in any combination of bottom bits of
+  (a,b,c).
+* "differ" is defined as +, -, ^, or ~^.  For + and -, I transformed
+  the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
+  is commonly produced by subtraction) look like a single 1-bit
+  difference.
+* the base values were pseudorandom, all zero but one bit set, or
+  all zero plus a counter that starts at zero.
+
+Some k values for my "a-=c; a^=rot(c,k); c+=b;" arrangement that
+satisfy this are
+    4  6  8 16 19  4
+    9 15  3 18 27 15
+   14  9  3  7 17  3
+Well, "9 15 3 18 27 15" didn't quite get 32 bits diffing
+for "differ" defined as + with a one-bit base and a two-bit delta.  I
+used http://burtleburtle.net/bob/hash/avalanche.html to choose
+the operations, constants, and arrangements of the variables.
+
+This does not achieve avalanche.  There are input bits of (a,b,c)
+that fail to affect some output bits of (a,b,c), especially of a.  The
+most thoroughly mixed value is c, but it doesn't really even achieve
+avalanche in c.
+
+This allows some parallelism.  Read-after-writes are good at doubling
+the number of bits affected, so the goal of mixing pulls in the opposite
+direction as the goal of parallelism.  I did what I could.  Rotates
+seem to cost as much as shifts on every machine I could lay my hands
+on, and rotates are much kinder to the top and bottom bits, so I used
+rotates.
+-------------------------------------------------------------------------------
+*/
+#define mix(a,b,c) \
+{ \
+  a -= c;  a ^= rot(c, 4);  c += b; \
+  b -= a;  b ^= rot(a, 6);  a += c; \
+  c -= b;  c ^= rot(b, 8);  b += a; \
+  a -= c;  a ^= rot(c,16);  c += b; \
+  b -= a;  b ^= rot(a,19);  a += c; \
+  c -= b;  c ^= rot(b, 4);  b += a; \
+}
+
+/*
+-------------------------------------------------------------------------------
+final -- final mixing of 3 32-bit values (a,b,c) into c
+
+Pairs of (a,b,c) values differing in only a few bits will usually
+produce values of c that look totally different.  This was tested for
+* pairs that differed by one bit, by two bits, in any combination
+  of top bits of (a,b,c), or in any combination of bottom bits of
+  (a,b,c).
+* "differ" is defined as +, -, ^, or ~^.  For + and -, I transformed
+  the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
+  is commonly produced by subtraction) look like a single 1-bit
+  difference.
+* the base values were pseudorandom, all zero but one bit set, or
+  all zero plus a counter that starts at zero.
+
+These constants passed:
+ 14 11 25 16 4 14 24
+ 12 14 25 16 4 14 24
+and these came close:
+  4  8 15 26 3 22 24
+ 10  8 15 26 3 22 24
+ 11  8 15 26 3 22 24
+-------------------------------------------------------------------------------
+*/
+#define final(a,b,c) \
+{ \
+  c ^= b; c -= rot(b,14); \
+  a ^= c; a -= rot(c,11); \
+  b ^= a; b -= rot(a,25); \
+  c ^= b; c -= rot(b,16); \
+  a ^= c; a -= rot(c,4);  \
+  b ^= a; b -= rot(a,14); \
+  c ^= b; c -= rot(b,24); \
+}
+
+
+/*
+-------------------------------------------------------------------------------
+hashlittle() -- hash a variable-length key into a 32-bit value
+  k       : the key (the unaligned variable-length array of bytes)
+  length  : the length of the key, counting by bytes
+  val2    : IN: can be any 4-byte value OUT: second 32 bit hash.
+Returns a 32-bit value.  Every bit of the key affects every bit of
+the return value.  Two keys differing by one or two bits will have
+totally different hash values.  Note that the return value is better
+mixed than val2, so use that first.
+
+The best hash table sizes are powers of 2.  There is no need to do
+mod a prime (mod is sooo slow!).  If you need less than 32 bits,
+use a bitmask.  For example, if you need only 10 bits, do
+  h = (h & hashmask(10));
+In which case, the hash table should have hashsize(10) elements.
+
+If you are hashing n strings (uint8_t **)k, do it like this:
+  for (i=0, h=0; i<n; ++i) h = hashlittle( k[i], len[i], h);
+
+By Bob Jenkins, 2006.  bob_jenkins at burtleburtle.net.  You may use this
+code any way you wish, private, educational, or commercial.  It's free.
+
+Use for hash table lookup, or anything where one collision in 2^^32 is
+acceptable.  Do NOT use for cryptographic purposes.
+-------------------------------------------------------------------------------
+*/
+
+static uint32_t hashlittle( const void *key, size_t length )
+{
+  uint32_t a,b,c;                                          /* internal state */
+  union { const void *ptr; size_t i; } u;     /* needed for Mac Powerbook G4 */
+
+  /* Set up the internal state */
+  a = b = c = 0xdeadbeef + ((uint32_t)length);
+
+  u.ptr = key;
+  if (HASH_LITTLE_ENDIAN && ((u.i & 0x3) == 0)) {
+    const uint32_t *k = (const uint32_t *)key;         /* read 32-bit chunks */
+#ifdef VALGRIND
+    const uint8_t  *k8;
+#endif
+
+    /*------ all but last block: aligned reads and affect 32 bits of (a,b,c) */
+    while (length > 12)
+    {
+      a += k[0];
+      b += k[1];
+      c += k[2];
+      mix(a,b,c);
+      length -= 12;
+      k += 3;
+    }
+
+    /*----------------------------- handle the last (probably partial) block */
+    /*
+     * "k[2]&0xffffff" actually reads beyond the end of the string, but
+     * then masks off the part it's not allowed to read.  Because the
+     * string is aligned, the masked-off tail is in the same word as the
+     * rest of the string.  Every machine with memory protection I've seen
+     * does it on word boundaries, so is OK with this.  But VALGRIND will
+     * still catch it and complain.  The masking trick does make the hash
+     * noticably faster for short strings (like English words).
+     */
+#ifndef VALGRIND
+
+    switch(length)
+    {
+    case 12: c+=k[2]; b+=k[1]; a+=k[0]; break;
+    case 11: c+=k[2]&0xffffff; b+=k[1]; a+=k[0]; break;
+    case 10: c+=k[2]&0xffff; b+=k[1]; a+=k[0]; break;
+    case 9 : c+=k[2]&0xff; b+=k[1]; a+=k[0]; break;
+    case 8 : b+=k[1]; a+=k[0]; break;
+    case 7 : b+=k[1]&0xffffff; a+=k[0]; break;
+    case 6 : b+=k[1]&0xffff; a+=k[0]; break;
+    case 5 : b+=k[1]&0xff; a+=k[0]; break;
+    case 4 : a+=k[0]; break;
+    case 3 : a+=k[0]&0xffffff; break;
+    case 2 : a+=k[0]&0xffff; break;
+    case 1 : a+=k[0]&0xff; break;
+    case 0 : return c;              /* zero length strings require no mixing */
+    }
+
+#else /* make valgrind happy */
+
+    k8 = (const uint8_t *)k;
+    switch(length)
+    {
+    case 12: c+=k[2]; b+=k[1]; a+=k[0]; break;
+    case 11: c+=((uint32_t)k8[10])<<16;  /* fall through */
+    case 10: c+=((uint32_t)k8[9])<<8;    /* fall through */
+    case 9 : c+=k8[8];                   /* fall through */
+    case 8 : b+=k[1]; a+=k[0]; break;
+    case 7 : b+=((uint32_t)k8[6])<<16;   /* fall through */
+    case 6 : b+=((uint32_t)k8[5])<<8;    /* fall through */
+    case 5 : b+=k8[4];                   /* fall through */
+    case 4 : a+=k[0]; break;
+    case 3 : a+=((uint32_t)k8[2])<<16;   /* fall through */
+    case 2 : a+=((uint32_t)k8[1])<<8;    /* fall through */
+    case 1 : a+=k8[0]; break;
+    case 0 : return c;
+    }
+
+#endif /* !valgrind */
+
+  } else if (HASH_LITTLE_ENDIAN && ((u.i & 0x1) == 0)) {
+    const uint16_t *k = (const uint16_t *)key;         /* read 16-bit chunks */
+    const uint8_t  *k8;
+
+    /*--------------- all but last block: aligned reads and different mixing */
+    while (length > 12)
+    {
+      a += k[0] + (((uint32_t)k[1])<<16);
+      b += k[2] + (((uint32_t)k[3])<<16);


-- 
CTDB repository