tdb_chainlock() in tdb1, tdb2 and tdb_compat ?

Ira Cooper ira at samba.org
Fri Apr 13 10:36:11 MDT 2012


On Fri, Apr 13, 2012 at 6:33 AM, Jelmer Vernooij <jelmer at samba.org> wrote:

> On Fri, Apr 13, 2012 at 10:50:15AM +0200, Volker Lendecke wrote:
> > On Fri, Apr 13, 2012 at 10:41:10AM +0200, Michael Adam wrote:
> > > I understand that there is a real problem with the
> > > sublte difference in signature and return semantics for
> > > tdb_chainlock.
> > >
> > > There can always be new callers that check == -1 instead of < 0,
> > > so I think the best way would have been to have a compat
> > > version for these, too, so that the difference is explicit.
> > >
> > > I also think simo's proposal was quite reasonable,
> > > but the middle course might be to consequently use the compat
> > > layer for such functions as chainlock for which the compiler does
> > > not complain about differences (at least the C compiler...).
> > >
> > > For the demand of samba3 still being able to link against
> > > system libtdb (version 1.X), couldn't we introduce tdb into
> > > libreplace and do some #define trickts? ...
> >
> > Different proposal: Rename tdb2 to tdb3 and introduce a tdb2
> > version that matches exactly tdb1 code with 2 changes:
> > tdb_off_t is uint64 and the freelist becomes a doubly linked
> > list. This way we solve the two pressing tdb1 problems
> > without a lot of the hassle that tdb2's semantic changes
> > bring. I checked: tdb_off_t does not leak into the published
> > API. We would probably limit a single record to 4GB, but I
> > doubt this is a real issue.
> That seems a lot simpler indeed. We could do that in the existing
> lib/tdb directory, without having to worry about supporting
> two distinct APIs in Samba.
>
> How important are these two fixes though? Do we really have to do them
> before 4.0, or could they reasonably be deferred until 4.1 along with
> the rest of tdb2?
>

"Very important."

We are getting nailed by the freelist problem, I pulled a server out of the
hat to check:

Server 1:
# echo free | tdbtool locking.tdb  | wc -l
400867
# echo info | tdbtool locking.tdb
tdb> 7472 records totalling 16559102 bytes

Server 2 (which I picked):
# echo free | tdbtool locking.tdb  | wc -l
1540397
# echo info | tdbtool locking.tdb
tdb> 14193 records totalling 22064477 bytes

-rw-r--r-- 1 root root 1.8G Apr 13 09:24 locking.tdb

The second server represents ~105 days of uptime.

A 20MB of data, database taking up 1.8GB, is absurd.  Period.  Never mind
the absurdity of having a 1.5 million entry long freelist, when you only
have 14,000 records.

A dually linked freelist will solve the fragmentation issue, making the
freelist much shorter and the databases much smaller.  I've written the
code to do it.  But I won't claim it is "bug free" or "perfect".  Only
reviewers, outside eyes and testing/time in production can make that happen.

As far as Rusty and this issue: I offered this code to the team months ago,
via Jeremy.  Volker, Jeremy, Rusty and I discussed it, and there were 2
primary objections:

1. It forces a flag day.  Unavoidable for such a change.  It was desired
that the next flag day be TDB2.
2. Can something like compaction be used instead?

On #2, I disagree, mainly from my experience showing me that compacting
live TDBs has caused me some grief.  We actually setup a cron to compact
the TDB once an hour... eventually it would crash smbd.  Also compaction
will involve a heurestic etc... Why do that if you can get guaranteed good
behavior, without some strange corner case gaming the system?

People are allowed to disagree, and I 100% respect Rusty's point of view,
but we disagree.  That's life.

We have not put the dually linked freelist code into production more due to
other issues creeping up of even higher importance.  But it is on my list
of things to do, especially as it is becoming clear S4 will release with a
TDB1 based fileserver.



> I think we should move to tdb2 eventually, but trying to support both
> tdb1 and tdb2 does add a fair bit of complexity (in terms of
> the build system and other code) with no huge gains at this
> point. We seem to have enough distractions for 4.0 as is.
>

I think we should look at our requirements, and decide how to best meet
them.  Sometimes there isn't a 1 size fits all solution.

In the end, I don't care what we choose, as long as the pain goes away, and
there is a TDB to do forward development on.

Thanks,

-Ira

Please note, the opinions expressed here are mine, alone, I do not speak
for my employer.


More information about the samba-technical mailing list