TDB: using the jenkins hash for non-persistent tdbs

Fri Sep 17 08:25:02 MDT 2010

Hi,

while doing performance tests with samba/ctdb we found that
locking.tdb processing gets slow for large recursiv directory enumerations.

Even using a very large hash table size doesn't give improvements.
Some hash chains where very long ~300 entries, while most of them
were empty.

It turned out the default hash algorithm isn't good if the keys
only differ by a few bits.

Rusty will use the jenkins hash
http://en.wikipedia.org/wiki/Jenkins_hash_function
for TDB2 as default, which does better balancing of the keys to the hash
chains.

I have a backport of that for usage only for non-persistent tdbs.

The schema will be that we'll be able to detect the usage of the jenkins
hash
on open of an existing tdb. And we'll use it only if a tdb is created
with TDB_CLEAR_IF_FIRST.

If the jenkins hash is used, we set the rwlocks field in the tdb header
to a magic value.
Current tdb versions expect this to be 0 and fail the open.

That means an current tdbdump won't be able to open such a tdb,
but current Samba with TDB_CLEAR_IF_FIRST, would be able to overwrite
the file
and format it with the current hash function.

http://gitweb.samba.org/?p=metze/samba/wip.git;a=shortlog;h=refs/heads/master4-tdb

metze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20100917/ffe2a9a6/attachment.pgp>