I have 2 shared object libraries, e.g. libA.so
and libB.so
. I have an executable that calls functions from both of these libraries, e.g. funcA()
from libA
and funcB()
from libB
.
Unfortunately, both libA
and libB
have some other functions that use the same symbol name, e.g. libA
has function f1()
and f2()
and libB
also has functions f1()
and f2()
. Note, the implementations of f1()
and f2()
are completely different between libA and libB, they are just called the same name.
The executable code is never calling f1()
or f2()
directly, and of course libA
should only call its f1()
& f2()
and libB
should only call its f1()
and f2()
.
But, it seems something goes wrong, and I get a segfault when running the executable.
I have a solution for this, which is to use a "version.script" when building libA
and libB
, so that only the external API functions (i.e. those called "func*") are exposed, i.e. the following script...
{
global: func*;
local: *;
};
... and using -Wl,--version-script=version.script
when linking with gcc.
This works, but it does cause me complications because I have to use a 3rd party tool-chain that makes it complex (i.e. hacky) to use this version.script method.
Is there a better / other way? (Note, I do have the source for libA
and libB
, and so changing symbol names is possible, but not liked since this code is auto-generated by 3rd party tool and I don't really want to go in and edit all the names. Of course, real code is not just f1()
and f2()
but hundreds of similarly named symbols).
Also, I would like to understand the root cause of the problem? I'm assuming that libA
(or libB
) gets "confused" to which f1()
or f2()
it should call, but why - since both libA
and libB
get compiled separately - something to do with shared object libraries I suspect?
In case it matters... source for the libs and executable is C, and using gcc to compile and link, and this is on Linux.
First, a minimal illustration of your problem. To keep it minimal I'll cut the ambivalent functions down from f1()
, f2()
to just f1()
. Since you say:
I would like to understand the root cause of the problem?
I'll explain that using the minimal illustration. Then I'll show how to avoid it without a version script.
The example
Source and header files for libA.so
:
// A.h
#pragma once
extern void funcA(void);
// A.c
#include <stdio.h>
#include "A1.h"
void funcA(void)
{
puts(__FUNCTION__);
f1();
}
// A1.h
#pragma once
extern void f1(void);
// A1.c
#include <stdio.h>
#include "A1.h"
void f1(void)
{
printf("%s from A1.c\n",__FUNCTION__);
}
Source and header files for libB.so
:
// B.h
#pragma once
extern void funcB(void);
// B.c
#include <stdio.h>
#include "B1.h"
void funcB(void)
{
puts(__FUNCTION__);
f1();
}
// B1.h
#pragma once
extern void f1(void);
// B1.c
#include <stdio.h>
#include "B1.h"
void f1(void)
{
printf("%s from B1.c\n",__FUNCTION__);
}
Source for a program:
// prog.c
#include <A.h>
#include <B.h>
int main(void)
{
funcA();
funcB();
return 0;
}
Compile all the shared library sources:
$ gcc -c -fPIC A*.c B*.c
And the program source:
$ gcc -c -I . prog.c
Link the shared libraries:
$ gcc -shared -o libA.so A*.o
$ gcc -shared -o libB.so B*.o
And the program:
$ gcc -o prog prog.o -L . -lA -lB -Wl,-rpath=$(pwd)
Then we see your problem:
$ ./prog
funcA
f1 from A1.c
funcB
f1 from A1.c
Both funcA
and funcB
both call the f1
from libA.so
, defined in A1.c
.
By re-linking the program, with libA.so
and libA.so
in reverse order:
$ gcc -o prog prog.o -L . -lB -lA -Wl,-rpath=$(pwd)
we can produce the opposite problem:
$ ./prog
funcA
f1 from B1.c
funcB
f1 from B1.c
Which is no less a problem.
The explanation
The ambivalent function symbol f1
is defined in the dynamic symbol tables of both libA.so
and libB.so
:
$ readelf -W --dyn-syms libA.so
Symbol table '.dynsym' contains 9 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTable
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (2)
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5 (2)
4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
5: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
6: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@GLIBC_2.2.5 (2)
7: 0000000000001182 31 FUNC GLOBAL DEFAULT 14 funcA
8: 0000000000001159 41 FUNC GLOBAL DEFAULT 14 f1
$ readelf -W --dyn-syms libB.so
Symbol table '.dynsym' contains 9 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTable
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (2)
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5 (2)
4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
5: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
6: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@GLIBC_2.2.5 (2)
7: 0000000000001159 41 FUNC GLOBAL DEFAULT 14 f1
8: 0000000000001182 31 FUNC GLOBAL DEFAULT 14 funcB
So in both shared libraries it is visible at runtime to the dynamic linker as an eligible
definition for references to f1
, and the dynamic linker will by default bind all such references
to the first definition it finds in the course of loading and linking the shared libraries recursively
required by the process under construction.
To run prog
, that construction starts with the loading of the executable prog
. Look at the
top of its dynamic section:
$ readelf --dynamic prog
Dynamic section at offset 0x2d90 contains 30 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libB.so]
0x0000000000000001 (NEEDED) Shared library: [libA.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000001d (RUNPATH) Library runpath: [/home/imk/develop/so/scrap]
...[cut]...
This is information that the static linker has written there for the dynamic linker to read and act
on. The executable needs to load libB.so
and libA.so
, in that order. The RUNPATH
/home/imk/develop/so/scrap
is a directory the runtime linker can search to find the needed shared libraries (in addition to its default
search directories). That's the result of the -rpath=$(pwd)
that I added to the program linkage (because I'm not going
to bother properly installing these throwaway libraries).
So in this case, the first loaded shared library in which the dynamic linker finds a definition for f1
will be libB.so
;
that's the definition from B1.c
, and it will bind all references in the program to that definition.
Let's revert to the original linkage of prog
:
$ gcc -o prog prog.o -L . -lA -lB -Wl,-rpath=$(pwd)
Then, as you've guessed, the top of the dynamic section of prog will read:
$ readelf --dynamic prog
Dynamic section at offset 0x2d90 contains 30 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libA.so]
0x0000000000000001 (NEEDED) Shared library: [libB.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000001d (RUNPATH) Library runpath: [/home/imk/develop/so/scrap]
...[cut]...
with libA.so
loaded before libB.so
, and libA
's definition of f1
will be the one that wins, as we saw.
The fix
Your problem arises from the static linker's default behaviour when creating a shared library: references within the
shared library to dynamic symbols that it defines are not preemptively bound to the internal definition1. There is a linker option that will override
the default behaviour to make each library call it's own definition of f1
. From man ld
(ld
is the static linker, invoked on your behalf by gcc
to perform
linkages):
-Bsymbolic
When creating a shared library, bind references to global symbols to the definition within the shared library, if any. Normally, it is possible for a program linked against a shared library to override the definition within the shared library. This option is only meaningful on ELF platforms which support shared libraries.
So we can relink the shared libraries like so:
$ gcc -shared -o libA.so A*.o -Wl,-Bsymbolic
$ gcc -shared -o libB.so B*.o -Wl,-Bsymbolic
(Incidentally, we use -Wl,<ld-option>
to tell gcc
to pass option ld-option
straight through to ld
.)
Relink the program with the new libraries:
$ gcc -o prog prog.o -L . -lA -lB -Wl,-rpath=$(pwd)
and then each library calls its own definition of f1
:
$ ./prog
funcA
f1 from A1.c
funcB
f1 from B1.c
You could likely also solve the issue by using GCC's dynamic visibility attribute, but it would require some source modifications in the libraries.