the sorry saga of the talloc soname 'fix'

tridge at samba.org tridge at samba.org
Mon Jul 6 10:21:34 GMT 2009


Hi Simo,

I've now spent all day looking at libtalloc and how it interacts with
what is currently in Ubuntu Jaunty. I have downloaded a Fedora image
but haven't yet installed it to see if Fedora is as badly placed as
Ubuntu is.

The result of my investigation is that libtalloc is a complete mess.
It turns out that with current Ubuntu we cannot completely avoid
having both the old talloc and the new talloc in the same process at
the same time. However, if we bump the .so number then at least the
developers will get a warning about the mess.

I've put together some files for you to look at to give you some idea
of just how bad this whole mess is. See 

  http://samba.org/tridge/talloc_mess/

The files are:

 - a libtalloc 1.2.0 (dev and lib) matching what is currently in Ubuntu
   Jaunty

 - a libtalloc 1.4.0 (dev and lib) matching what is produced if we
   followed your suggested course of action

 - a libtalloc 2.0.0 (dev and lib) matching what is produced if we
   follow my preferred choice of using a new .so number

 - source tar balls with debian build rules for all of the above

 - a sample 'testtalloc' package that demonstrates the problems (deb
   plus source)

The testtalloc package produces two binaries. One is called test_ldb,
and it creates a ldb then tries to free it with talloc_free() which is
about as simple a ldb program as you can have. The other is called
test_mapi which initialises the MAPI subsystem from openchange then
uses talloc_report_full() to show the memory that has been used.

I chose these two binaries as they demonstrate different types of
brokenness in the way that talloc/ldb/mapi/samba/openchange etc have
all been packaged. For example:

  - The libldb-samba4-0 package provides a libldb.so.0 which has a
    built in static copy of talloc.

  - the libmapi.so package links to a dynamic libtalloc.so, but also
    links to libdcerpc.so

  - libdcerpc.so has a staticly linked talloc built in

  - etc etc

The same type of brokenness is rife through all the various packages
that use talloc currently.

If we used the approach you are advocating, then all of these packages
(ldb, openchange, mapi, samba etc) won't be marked as needing to be
rebuilt. Yet they will all abort with no error message when you
actually use them, because they will mix the two incompatible
ABIs. Try the test_ldb and test_mapi binaries to see the abort.

If we use the approach that I prefer, which is to change the .so
number to 2, then at least the developers get a nice warning like
this:

  /usr/bin/ld: warning: libtalloc.so.1, needed by /usr/lib/gcc/x86_64-linux-gnu/4.3.3/../../../../lib/libmapi.so, may conflict with libtalloc.so.2

So at least someone gets told that it won't work at build time, which
gives some hope that it might get fixed.

If we up the .so number to 2 then you can also see the brokenness by
looking at the dependencies, because we are explicitly marking the ABI
as having changed. It is easy to see the brokenness using ldd, or by
using dpkg. 

If we don't do this then we're saying "the ABI is the same" when it
isn't. This is clearly shown by the abort in the test progams above,
regardless of whether you install the 1.4.0 libtalloc or the 2.0.0
libtalloc. 

So even with your attempts to make the ABIs more similar by putting
backward compatibility code into talloc.c we get aborts because the
internal structures are not compatible (which is nicely caught by
Metze's patch).  Your attempts to make the ABIs compatible are not
enough, and would pollute the code with a lot of cruft that serves no
purpose, plus it will remove the warnings that developers that would
otherwise get when things are going to go wrong with some of the
libraries.

So Simo, please look at the above examples, then please revert your
commit. Also, in future, please don't revert a maintainers commits
without checking with the maintainer.

Also, Metze, you were right, your abort() check on version really is
needed, and really does happen with real examples. Thanks!

To prevent this happening in future we have to stop mixing staticly
linked libraries with shared versions of the same libs. That will mean
a lot of changes to the way that lots of libs are produced by the
Samba project and how they are linked into projects like openchange.

I hope I don't have to spend another day like today tracing shared
library problems. As I have said several times previous when proposals
of Samba shared libs come up, getting shared libs right is really
_really_ hard. We have come nowhere near to getting it right yet, and
the work required to get it right is quite substantial. I'm not
volunteering to do the work.

Cheers, Tridge


More information about the samba-technical mailing list