NTDB progress!

Rusty Russell rusty at rustcorp.com.au
Mon Jun 4 05:41:34 MDT 2012


Hi all,

        Sorry this has been delayed: two things happened.  Firstly,
other duties involved me going to Hong Kong for a week.  Secondly,
porting revealed an unacceptable slowdown for smaller databases going
from tdb to tdb2, so after much benchmarking, the format was simplfied
to be closer to the original tdb.  See benchmarks below taken from that
commit message; we still pay a slight penalty for 64 bit.

See my ntdb-wip head:

        https://git.samba.org/rusty/samba.git/?p=rusty/samba.git;a=shortlog;h=refs/heads/ntdb-wip

So far:

   All of source4/ is converted to ntdb, as is ldb (it handles the
switch internally).  I've written a dbwrap_open_local() which switches
between the ntdb and tdb backends based on 'use old tdb = yes'
configution option for dbwrap users.  If this isn't set, I plan to use
the tdb backend if a tdb file is there, otherwise use ntdb, but I
haven't implemented that.

        The general rule of conversions has to be to rename databases to
".ntdb", so it's absolutely clear.  The dbwrap_open_ntdb() will change
.tdb names to .ntdb names for the moment, though dbwrap_open_tdb() will
do the reverse mapping, so you can use either method (not yet
implemented).

        Everything not using dbwrap is being converted; CLEAR_IF_FIRST
or INTERNAL databases are fairly non-controversial.  If something else
should not be converted, feel free to change it to use dbwrap.

Note that NTDB_DATA/struct ntdb_data is a synonym for TDB_DATA/struct
TDB_DATA if tdb.h is included before ntdb.h: without this, compatibility
becomes a nightmare, as these are used all over Samba.

To come:

        There's a bit more source3 to convert, then lots of testing and
making sure the s3->s4 upgrade scripts work well.  I'll be working on
this all this week.

BTW, here are the benchmarks which made me rework the NTDB hash code:

				Insert	Re-ins	Fetch	Size	dbspeed
				(nsec)	(nsec)	(nsec)	(Kb)	(ops/sec)
TDB (10000 hashsize):	
	100 records:		 3882	 3320	1609	   53	203204
	1000 records:		 3651	 3281	1571	  115	218021
	10000 records:		 3404	 3326	1595	  880	202874
	100000 records:		 4317	 3825	2097	 8262	126811
	1000000 records:	11568	11578	9320	77005	 25046

TDB2 (1024 hashsize, expandable):
	100 records:		 3867	 3329	1699	   17	187100	
	1000 records:		 4040	 3249	1639	  154	186255
	10000 records:		 4143	 3300	1695	 1226	185110
	100000 records:		 4481	 3425	1800	17848	163483
	1000000 records:	 4055	 3534	1878   106386	160774

NTDB (8192 hashsize)
	100 records:		 4259	 3376	1692	   82	190852
	1000 records:		 3640	 3275	1566	  130	195106
	10000 records:		 4337	 3438	1614	  773	188362
	100000 records:		 4750	 5165	1746	 9001	169197
	1000000 records:	 4897	 5180	2341	83838	121901

Analysis:
	1) TDB wins on first insert on small databases, beating TDB2 by
           ~15%, NTDB by ~10% on dbspeed.
	2) TDB starts to lose when hash chains get 10 long (fetch 10% slower
	   than TDB2/NTDB).
	3) TDB does horribly when hash chains get 100 long (fetch 4x slower
	   than NTDB, 5x slower than TDB2, insert about 2-3x slower).
	4) TDB2 databases are 40% larger than TDB1.  NTDB is about 15% larger
	   than TDB1.

Cheers,
Rusty.


More information about the samba-technical mailing list