the sorry saga of the talloc soname 'fix'

Sat Jul 4 19:17:05 GMT 2009

On Sat, 2009-07-04 at 11:24 +1000, tridge at samba.org wrote:
> Hi Jeremy and Simo,
> 
>  > Thanks a *lot* for this one Simo, much appreciated.
> 
> I'm finding all this congratulation rather disturbing. The patch from
> Simo now creates the very problem you are all so keen to avoid.
> 
> With the soname bump we had lots of standard mechanisms in place (both
> packaging and loader) to try to stop having two versions of the
> library in place at the same time. It would be detected at package
> install time, and also at runtime by Metze's patch.

Tridge,
let's start with a very simple assumption I think we can all agree on.

If, in the same process, you end up with 2 different versions of the
same library you are pretty much doomed no matter what. (Sure you can be
lucky sometimes but in general you are just waiting on a ticking bomb).

So, based on this very basic assumption, we can proceed in understanding
what happens with or without the soname bump.

> Now let's look at what happens with this much applauded patch from
> Simo.
> 
>  1) the major so versions of the libraries are now the same, so you're
>  telling the packaging managers and the loader that they are
>  compatible. So distros and users will quite happily mix the versions
>  now.

What we are telling packagers is that if you have an application
currently using libtalloc.so.1.2.0 and you upgrade to libtalloc.so.1.4.0
then all is fine. And indeed it is. You are not using 2 libraries, you
are just using a newer one that provides the same interfaces as the old
one, in short it is ABI compatible.

It is generally understood that it is ok to use a newer version with an
older program. It is also generally understood that if App X is linked
against lib Y v.1.5 it is not really a good idea to try to use it with
lib Y v.1.1
This is reflected in package dependencies. You will see that samba
version 3.2.x which uses talloc 1.2.0 has a dependency on the talloc
package with version >= 1.2.0

And if you try to use that samba version with libtalloc 1.4.0 you will
see it works.

Likewise the next samba package that uses 1.4.0 will have a dependency
on libtalloc >= 1.4.0 so, at install time, the user will be told to
upgrade libtalloc as well.

>  2) the code is not in fact now compatible, as the patch from Metze is
>  still in there, so if anyone actually tries to load both of them,
>  then it will abort(), with no feedback to the user on what is
>  actually wrong, and no mechanism for them to fix it except to
>  manually recompile one or both of the offending packages, if they can
>  even work out what packages need changing.

No, this can't happen, for the simple reason that you have only one
version of the library with the same soname in any given process, for
one soname version the dynamic loader will always find the same library.

Whatever is internal to the library is not exposed to the rest of the
application so there is no code compatibility issue.

Of course, if the magic number were exposed to the application (for
example in the header files), and the app relied on that number outside
of the library itself, than the change would be basically a violation of
the ABI.

The linker looks at the major soname version by searching for
libtalloc.so.1 That's why we normally have a symbolic link like:
libtalloc.so.1 -> libtalloc.so.1.2.0 (although we could install the
library directly as libtalloc.so.1 without any problem and indeed it is
so in some versions of Fedora IIRC).

So in /usr/lib you can point only at one version with soname .1 and that
means you can't have 2 versions with the same soname loaded at the same
time, as the dynamic loader will pick up only one.

So the only real consequence of a program linked against
libtalloc.so.1.4.0 that finds libtalloc.so.1.2.0 is that it will fail to
start because the loader will not find some of the new symbols that are
present in 1.4.0 but not in 1.2.0. This will immediately tell the user
that they are using the wrong library version.

> As for avoiding this with symbol versioning, that's all well and good
> on platforms that have symbol versioning, but many of the platforms
> that we claim we support don't have that, so now we've left them out
> in the cold.

Well, we are not using symbol versioning at the moment anyway. We could
if we wanted to commit to an even stronger ABI promise, and we would not
let platforms without it any more in the cold than they are right now.
Platforms with poor linkers/loaders and no packaging system already know
the dangers of dynamic libaries, that's why most people tend to build
static binaries on them. I suggest people keep building on those
platform with all the samba libraries statically builtin, this will
avoid them any versioning problem whatsoever, and they can be as
careless as they want towards libraries then.

> The so major number change is a crude instrument for handling changes
> in versions of libraries, but it is portable to all shared library
> systems I know of, and it works. I really don't understand why there
> is this revulsion at using the library version number for exactly what
> it was designed for.

On the contrary if you have libtalloc.so.1 and libtalloc.so.2 the loader
could theoretically load both and it will just use whatever symbol is
found because we don't do symbol versioning, and the symbols in the 2
libraries have the same names.

Because the 2 libraries use the same symbol names what can happen is
that the loader will load most simbols from, say, libtalloc.so.1.2.0 and
the "missing ones" from libtalloc.so.2.0.0 (it really depends on what
order they are loaded). This would lead to the abort() metze introduced.
And he introduced it exactly because of this problem, that happens only
when you change the soname.

So, in short, what you describe in 1 and 2 is what happen if you bump
the soname not if you keep it.

Of course if you need to introduce a change that has to break the ABI
there is no alternative, and the packager will have to be careful not to
include libraries that link against 2 versions of the library.

Now, about your other concerns:

> I forgot to mention one more problem with what Simo has now setup,
> just in case you are now tempted to remove Metze's magic number check
> as a way to 'fix' the problem I pointed out in my last email.
> 
> The fix for the bug that started this whole discussion also involved a
> change to the internal struct talloc_reference_handle. By changing the
> .so number this change was safe, as the linker and package manager
> (plus Metze's paranoia patch!) will ensure they aren't mixed at
> runtime.

Internal symbols are not exported (see also talloc.exports) so this is
not going to happen.

> But now that Simo has reverted that change and thus explicitly allowed
> the old code and the new code to reside in the same running process,
> we will be mixing the old structure with the new one.

As explained before this can't happen.

> This means we at
> minimum will get valgrind errors where we reference memory beyond the
> end of the allocated structure. I wonder if it is even an exploitable
> security hole?

It's not, as it can't happen.

> So yes, congratulations all round are in order. You've just overridden
> the talloc package maintainer by introducing real bugs, made us
> non-portable, introduced silent and difficult to track failures of
> applications, broken the ABI promises, lied to the distro package
> managers and loader and generally had a great time. But at least we
> haven't brought the good name of free software into disrepute by
> using up a precious .so number, so it was all worth it.

I did none of the things you mention here, and certainly I did not have
a great time, by *FAR*.

Simo.

Note: it is a public holiday here, I won't probably be reachable until
Monday (US EST time), so the discussion will have to resume next week if
there are still doubts or issues.

-- 
Simo Sorce
Samba Team GPL Compliance Officer <simo at samba.org>
Principal Software Engineer at Red Hat, Inc. <simo at redhat.com>
-- 
Simo Sorce
Samba Team GPL Compliance Officer <simo at samba.org>
Principal Software Engineer at Red Hat, Inc. <simo at redhat.com>