[PATCH] LMDB full patch set

Andrew Bartlett abartlet at samba.org
Wed May 9 19:15:11 UTC 2018

On Wed, 2018-05-09 at 18:07 +0200, Stefan Metzmacher wrote:
> Hi Andrew,
> > > > I've addressed your concerns as far as practical[1].  If you could
> > > > please allow this to proceed into master I would most appreciate it. 
> > > 
> > > Can you move all tdb related patch to the start of the patchset,
> > > they should work on their own. Also the fork protection tests.
> > > 
> > > I don't see where ldb_tdb_test is used in tests...
> > 
> > I've fixed all this and pushed those to autobuild with Garming's
> > review.
> Thanks!
> > > From reading the patches it seems that it will no longer possible
> > > to build ldb or an AD DC without lmdb, e.g. 32bit support will be gone.
> > > I don't think that's acceptable right now, maybe in a few years.
> > 
> > Yes, you are correct we will require lmdb for the AD DC and ldb
> > standlone builds.
> > 
> > lmdb is shipped on 32 bit ubuntu.  I've set up a test environment and
> > will confirm the LDB tests pass on it.  Howard Chu indicates earlier in
> > the thread that it is fully functional, but limited in DB size to 2GB.
> Are we going to require lmdb at runtime or do you want to require it
> at compile time?

Just compile time for now, it is new and and want to test it. 

> Even for the standalone ldb build we have --disable-python.
> I think a standalone ldb should be able to build without lmdb
> it the user really wants this.

The only reason --disable-python needs to remain is for the fileserver,
which never uses python anyway.  I support keeping that mode because it
will make it easier to move to python3 (ironically) because we can be a
little more aggressive on dependencies for the AD DC than for pure
fileservers, which still need to work on AIX et al. 

For your example, it is actually what I want to avoid!  

In that case if we had a system ldb without python, we then go and
build pyldb from the (samba) build tree, but call #include
"ldb_private.h"!  While it is the classic way to introduce new things,
it creates combinations activated either accidentally or 'because I
want to', it turns out to be both untested and actively harmful.  

For this particular one I'll be submitting a patch today to make that
building a python-enabled Samba against a not-python ldb will just
fail. (This mistake actually goes back to why python was itself the new
untrustworthy and experimental thing). 

> For the AD DC I'm happy to require it (similar to the jansson library)if
> selftest is enabled, but I really don't see why we need to require
> it at build time (at least an explicit --without-ldb-lmdb should be
> possible for the AD DC and standalone builds).

The problem with this approach is that we create more untested
combinations.  Also, as I mentioned in my previous mail, we have less
freedom compared with jansson because Samba, built for selftest, can be
built against a system ldb.  

Now, like for python, we would have two different possible system ldb
versions, one acceptable and one not acceptable.

> Can you summaries how this is supposed to work at runtime?
> (I didn't have time to play with this yet)
> Will lmdb only used for the partition files or are all files,
> e.g. secrets.ldb and wins.ldb.

Just the sam.ldb backend for now.  The choice is via yet another option
at provision time.

> In case lmdb is not activated by default (at runtime)
> and you prepare a tmp patch to do that and check what autobuild says
> about that?

Likely nothing, as I've not forced an environment to use it yet.  You
ask below for a patch to enable it, and I'll do that today, so Samba's
use of lmdb is covered in autobuild.  (Gary's development testing was
done with such a temp patch, it made everything use lmdb by default,
but that isn't a step we want to take yet). 

> In the configure check we also check for a specific function of lmdb
> and only enable HAVE_LMDB on success (together with an unused
> conf.env.ENABLE_MDB_BACKEND), but if it's not found we don't complain...

I'll ensure it is handled.  I expected such a discussion and so we
hedged our bets a little.

> > The DB size issue is also the first issue I've found empirically, as we
> > got this while trying to set the default DB size:
> > 
> > [1295/4337] Compiling lib/ldb/ldb_mdb/ldb_mdb.c
> > ../lib/ldb/ldb_mdb/ldb_mdb.c: In function lmdb_open_env:
> > ../lib/ldb/ldb_mdb/ldb_mdb.c:726:2: error: large integer implicitly
> > truncated to unsigned type [-Werror=overflow]
> >   ret = mdb_env_set_mapsize(*env, 16LL * GIGABYTE);
> > 
> > As the operational side is both optional and not a major departure from
> > the 4GB limit on TDB, I think this is a reasonable restriction.  Those
> > running a RaspberryPi DC won't be running hundreds of users in any
> > case.
> > 
> > To address the broader point:
> > 
> > My strong preference is that the AD DC be able to rely on this feature
> > being available.  I don't mind if the fileserver-with-bundled-libraries 
> > has it optional, but for the AD DC I'm actively trying to reduce the
> > possible combinations so we can keep everything tested.
> For autobuild I'm fine with the requirement, but we need to test the
> code path without lmdb anyway and keep it working. That's why I
> think we should have a --without-ldb-lmdb option.

Yes, but that should be by having some selftest environments running
with lmdb and some without.  We don't need the complexity of builds
with and without lmdb to cover that.  Those extra configure options
have a real cost!  See the work I've had to do to ensure the pointless
combination of --without-ldap remains supported.  

For each combination we add, someone uses it and then wonders why it
doens't work and complains at best or suffers harm at worst.  An AD DC
is no small thing, we can reasonably choose to depend on widely
available libraries.  

> > (Once this lands, I'll change some of our selftest environments to use lmdb). 
> I want to see the patch that enables it first.


> > That is why 'samba' needs to require it for the AD DC.  For standlone
> > ldb, that reason I've not made it optional is that otherwise we need to
> > declare to samba that the built-standalone 'system' ldb doesn't have
> > lmdb, then have a reliable method to detect that.  This seemed like all
> > to much complexity for a feature we will likely soon strictly rely on. 
> Why do we have to rely on this strictly very soon, that is every tricky
> new code without any real world testing.

It may or may not be tricky new code, but does that justify adding more
tricky code to ensure we can build (but not test) with and without the
tricky code?

> > (We have been actively removing the ability to run 'make test' without
> > optional packages after we found parts of our testsuite silently
> > skipped.  I'll propose the same for the ldb package once this is
> > agreed.)
> Yes, for autobuild that's fine, but there's a real world outside of
> autobuild.

There is.  I actually think that those things 'required for --enable-
selftest' should be just required, as untested code is broken code, and
clearly if it is a combination that selftest requires then it isn't
tested outside selftest.  

I realise there are limits, and many rightly feel it is still better to
build untested than not build at all on some platforms, but that is the
background motivation here. 

Remember also that we have since found there are no non-samba users of
lmdb beyond sssd and zombie openchange sites.  So we should not invest
greatly in the fiction that ldb is installed for other reasons (unlike
TDB/talloc/tevent which has been more broadly successful). 

Likewise with ldb is only built standalone for Samba/sssd use in

> > Portability to i386 I can sort out, but can you confirm you are
> > otherwise happy with the branch?
> > 
> > https://gitlab.com/catalyst-samba/samba/commits/metze-master4-lmdb-full
> Not is this state sorry and I'd like to play with lmdb activated
> and have a look at the files myself...
> Thanks for your patience...

Can you please give me a timeframe here?  

You and I have made small and important improvements, but the patches I
re-pushed yesterday are still essentially unchanged from the patches I
sent you a month ago[1], and those that remain outside master are in
turn essentially the patches that were posted two months ago.  

Could any remaining issues be dealt with once it lands in master?

Could we delay the tagging of an ldb 1.4.0 release until closer to the
4.9.0 RC and get some feedback and improve your comfort with the code
and configure/build behaviour in the meantime?  

I'm more than happy to keep working with you on this once it is in
master, but I would like to know we are past the indefinite review


Andrew Bartlett

[1] I've spent days on days re-ordering for your preference. Yesterdays
re-order work came to a essentially a zero diff, and the previous work
came to what you saw attached earlier).  

[2] I have many other impressive rabbits to help pull out of the hat,
including another go at the Heimdal update, Gary's audit logging, Tim's
PSO support and of course the endless code review.  I can't properly
focus on those until I know this is closed. 

Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba

More information about the samba-technical mailing list