Sidebar on linkers detecting mismatches (started with Failed to set gid privileges)

Fri Sep 8 09:34:57 GMT 2000

  [Peter]
> > It's the only sane thing to do when your library and kernel are
> > often upgraded independently.  Commercial Unix doesn't have this
> > problem, nor do the various free BSD's.

[Dave]
> I mean to say that it's a bad **implementation** of the solution, not
> that the problem shouldn't be solved...

Perhaps.  I'm having a little trouble understanding what you think the
system should do in the face of missing facilities at runtime.

It sounds like you want the Linux dynamic linker to check available
system calls and fail at load time.  This is quite impractical.  The
only way to check for most system calls is to try them.  You load up a
register with the syscall number, say, and load other registers with
your function params, and jump to kernel mode (with method appropriate
to your architecture...), and land in the middle of a jump table, and
all unused table entries point to the function sys_ni_syscall() (_ni_
meaning "not implemented") which immediately returns ENOSYS.  Libc
stuffs this into errno and returns -1 to the application.  So you see
it's not really possible for the libc to instantly know everything
about its environment at runtime.

> > If you have a significantly older libc than what the application was
> > linked with, it will probably fail at load time.  libc is somewhat
> > back-compatible at the binary level but not necessarily
> > forward-compatible.

> That's backwards: Sun warrants and tests applications for **forward**
> computability and Samba passes the test.  A version of Samba compiled
> under 2.5.1 shall work under 2.6, 7 8 or 9, or it's Sun's fault.

What I call backward-compatibility you call forward-compatibility.
Later versions of the OS will run software compiled against earlier
versions.  We weren't disagreeing, IOW, except in terminology.

> Or the task could be shouldered by the glibc folks, but only if they
> wished to do so: they could look at the OS release and warn if they
> were going to be degraded.

So -- you are suggesting that every function that wouldn't have worked
properly on Linux 1.0 has to have a little init stub that loads at
library init time, checks for proper support for itself and squawks if
you try to link it in without a new enough kernel?  I guess this could
be done, but I'd think the runtime startup cost would be significant.
NOT that Linux really shines there anyway, because although it uses ELF
it does *not* pull any tricks like IRIX does to "pre-link" binaries (to
optimize the common case of no custom LD_LIBRARY_PATH, no preloading).

> Solaris uses version numbers on interfaces: pvs -s /usr/lib/libc.so.1
> will give you something like
> 
> libc.so.1:
>           _end;
>           _GLOBAL_OFFSET_TABLE_;
>           _DYNAMIC;
>           _edata;
>           _PROCEDURE_LINKAGE_TABLE_;
>           _etext;
>     SUNW_1.20:
>           resetmnttab;
>           getextmntent;
>     SUNW_1.19:
>           strlcat;
>           strlcpy;
>           umount2;
>           _umount2;
>      SUNW_1.18.1:

Recent versions of glibc have been using symbol versioning, but I don't
know how it's implemented.  I do know it lets you run a glibc-2.0 ABI
program with glibc-2.1 (assuming you didn't do undocumented things of
course).  And the Linux kernel tends to keep old syscalls around for
quite some time (though not *forever*) so that's functionally almost a
form of versioning too.

All in all I think cross-version compatibility works about as well as
one can expect, given that the kernel and libc *are* going to go out of
sync.  Take the umount(2) syscall.  A couple years ago Linux got a new
syscall for umount() that takes an extra argument to let you
force-umount a busy filesystem.  If you have a new kernel, new libc, it
uses the new umount call.  New kernel, old libc, it uses the old umount
call.  New libc, old kernel, it tries the new umount, gets ENOSYS, uses
the old umount.  Meanwhile the application is none the wiser because
the libc umount() function hasn't changed.  (I think there's a separate
libc function exposing the additional funtionality.)

> The ELF linkers, unless **specifically** told not to, will link an
> interface in a library only when it's first called.  This should
> be/is true on Linux.  That means that a missing function won't cause
> a run-time error unless the application actually calls it.

A lazy ld.so?  I'm not sure ... actually I don't think so, but I could
be wrong.  I know dlopen() RTLD_LAZY mode is supported, and that's at
least somewhat related.

> > We've seen several questions on samba@ recently where a user's
> > precompiled Samba won't run on his custom-compiled kernel because
> > he omitted SysV IPC support.  Not too much you can do about that
> > except user education.

> That one gets caught at run-time, by the linker.  In fact, it tends
> to get caught when the linker is making a list of .so's that it will
> need, and logged. The call then fails when first tried.

But that's just it -- on Linux the linker *can't* catch this at
runtime.  Because the shmctl() function is right there, in libc.  So
it'll link.  But it may elicit ENOSYS from the kernel, when invoked.

The only way around it that I can see would be to implement some sort
of fast method of looking up the exact ABI of the running kernel.
Perhaps a bitmap of features present, like you mentioned earlier,
available for mmap from /dev or something.  And there could potentially
be a *lot* of said features, and I could foresee a lot of arguments in
the development community over exactly how big of an ABI
change/addition is worthy of a new status bit....

Peter