tdb_chainlock() in tdb1, tdb2 and tdb_compat ?

Andrew Bartlett abartlet at
Fri Apr 13 16:29:08 MDT 2012

On Fri, 2012-04-13 at 12:36 -0400, Ira Cooper wrote:
> On Fri, Apr 13, 2012 at 6:33 AM, Jelmer Vernooij <jelmer at> wrote:
> > On Fri, Apr 13, 2012 at 10:50:15AM +0200, Volker Lendecke wrote:

> > > Different proposal: Rename tdb2 to tdb3 and introduce a tdb2
> > > version that matches exactly tdb1 code with 2 changes:
> > > tdb_off_t is uint64 and the freelist becomes a doubly linked
> > > list. This way we solve the two pressing tdb1 problems
> > > without a lot of the hassle that tdb2's semantic changes
> > > bring. I checked: tdb_off_t does not leak into the published
> > > API. We would probably limit a single record to 4GB, but I
> > > doubt this is a real issue.
> > That seems a lot simpler indeed. We could do that in the existing
> > lib/tdb directory, without having to worry about supporting
> > two distinct APIs in Samba.
> >
> > How important are these two fixes though? Do we really have to do them
> > before 4.0, or could they reasonably be deferred until 4.1 along with
> > the rest of tdb2?
> >
> "Very important."
> We are getting nailed by the freelist problem

> A 20MB of data, database taking up 1.8GB, is absurd.  Period.  Never mind
> the absurdity of having a 1.5 million entry long freelist, when you only
> have 14,000 records.
> A dually linked freelist will solve the fragmentation issue, making the
> freelist much shorter and the databases much smaller.  I've written the
> code to do it.  But I won't claim it is "bug free" or "perfect".  Only
> reviewers, outside eyes and testing/time in production can make that happen.
> As far as Rusty and this issue: I offered this code to the team months ago,
> via Jeremy.  Volker, Jeremy, Rusty and I discussed it, and there were 2
> primary objections:
> 1. It forces a flag day.  Unavoidable for such a change.  It was desired
> that the next flag day be TDB2.
> 2. Can something like compaction be used instead?
> On #2, I disagree, mainly from my experience showing me that compacting
> live TDBs has caused me some grief.  We actually setup a cron to compact
> the TDB once an hour... eventually it would crash smbd.  Also compaction
> will involve a heurestic etc... Why do that if you can get guaranteed good
> behavior, without some strange corner case gaming the system?
> People are allowed to disagree, and I 100% respect Rusty's point of view,
> but we disagree.  That's life.
> We have not put the dually linked freelist code into production more due to
> other issues creeping up of even higher importance.  But it is on my list
> of things to do, especially as it is becoming clear S4 will release with a
> TDB1 based fileserver.

So, what would it take to instead release a tdb2 based autoconf build
for the file server?  If these things are indeed 'very important', and
we only want to do the flag day pain once, then perhaps someone who is
familiar with the autoconf build system could switch that to using an
internal build of tdb2, or the system tdb2 when it becomes available?

This would ensure we only have one tdb API and ABI that we care about,
which would be a very good thing. 

We already build and test the top level code with tdb2, so there is good
reason to think that the porting work is complete, and we have some
confidence in the code due to it being part of autobuild.

> > I think we should move to tdb2 eventually, but trying to support both
> > tdb1 and tdb2 does add a fair bit of complexity (in terms of
> > the build system and other code) with no huge gains at this
> > point. We seem to have enough distractions for 4.0 as is.
> >
> I think we should look at our requirements, and decide how to best meet
> them.  Sometimes there isn't a 1 size fits all solution.
> In the end, I don't care what we choose, as long as the pain goes away, and
> there is a TDB to do forward development on.

The current situation is clearly unacceptable in the long term, and is
becoming unacceptable in the short term.  Having spent a little time
dealing with the build system issues (on the waf side), we are now down
to the more fundamental issues (which is why I tried to remove what
distractions I could). 

I would like us to make a decision on this:
 - after rusty has had a chance to comment and
 - before SambaXP
because I still hold onto the faint hope of doing some kind of beta
release at SambaXP, if I can coax s3fs into a little more life. 

Andrew Bartlett
Andrew Bartlett                      
Authentication Developer, Samba Team 

More information about the samba-technical mailing list