I have a problem understanding, what exactly happens, when a dynamic library is loaded at runtime and how the dynamic linker recognizes and treats "same symbols".
I've read other questions related to symbolic linking and observed all the typical recommendations (using extern "C", using -fPIC when linking the library, etc.). To my knowledge, my specific problem was not discussed, so far. The paper "How to write shared libraries" https://www.akkadia.org/drepper/dsohowto.pdf does discuss the process of resolving library symbol dependencies, that may explain what's happening in my example below, but alas, it does not offer a workaround.
I found a post where the last (unfortunately) un-answered comment is very much the same as my problem:
Is there symbol conflict when loading two shared libraries with a same symbol
Only difference is: in my case the symbol is being an auto-generated constructor.
Here's the setup (Linux):
My assumption is: the constructor of class Dummy exists already in memory since master uses this function itself, and when loading the shared library it does not load its own version of the constructor, but simply re-uses the existing version from master. By doing that the extra string variable is not initialized correctly in the constructor, and accessing it segfaults.
When debugging into the assembler code when initializing the Dummy variable d in the slave, indeed Dummy's constructor inside the master's memory space is being called.
Questions:
How does the dynamic linker (dlopen()?) recognize, that the class Dummy used to compile the master should be the same as Dummy compiled into Slave, despite it being provided in the library itself? Why does the symbol lookup take the master's variant of the constructor, even though the symbol table must also contain the constructor symbol imported from the library?
Is there a way, for example by passing some suitable options to dlopen() or dlsym() to enforce usage of the Slave's own Dummy constructor instead of the one from Master (i.e. tweak the symbol lookup/reallocation behavior)?
Code: full minimalistic source code example can be found here:
https://bauklimatik-dresden.de/privat/nicolai/tmp/master-slave-test.tar.bz2
Relevant shared lib loading code in Master:
#include <iostream>
#include <dlfcn.h> // shared library loading on Unix systems
#include "Dummy.h"
int create(void * &data);
typedef int F_create(void * &data);
int destroy(void * data);
typedef int F_destroy(void * data);
int main() {
// use dummy class at least once in program to create constructor
Dummy d;
d.m_c = "Test";
// now load dynamic library
void *soHandle = dlopen( "libSlave.so", RTLD_LAZY );
std::cout << "Library handle 'libSlave.so': " << soHandle << std::endl;
if (soHandle == nullptr)
return 1;
// now load constructor and destructor functions
F_create * createFn = reinterpret_cast<F_create*>(dlsym( soHandle, "create" ) );
F_destroy * destroyFn = reinterpret_cast<F_destroy*>(dlsym( soHandle, "destroy" ) );
void * data;
createFn(data);
destroyFn(data);
return 0;
}
Class Dummy: the variant without "EXTRA_STRING" is used in Master, with extra string is used in Slave
#ifndef DUMMY_H
#define DUMMY_H
#include <string>
#define EXTRA_STRING
class Dummy {
public:
double m_a;
int m_b;
std::string m_c;
#ifdef EXTRA_STRING
std::string m_c2;
#endif // EXTRA_STRING
double m_d;
};
#endif // DUMMY_H
Note: if I use exaktly same class Dummy both in Master and Slave, the code works (as expected).
I followed the recommendations found in the last answer and Is there symbol conflict when loading two shared libraries with a same symbol :
...
000000000000612a W _ZN5DummyC1EOS_
00000000000056ae W _ZN5DummyC1ERKS_
0000000000004fe8 W _ZN5DummyC1Ev
...
So, the mangled function signatures match in both the master's binary and the slave.
When loading the library, the master's function is used instead of the library's version. To study this further, I created an even more minimalistic example like in the post referenced above:
master.cpp
#include <iostream>
#include <dlfcn.h> // shared library loading on Unix systems
// prototype for imported slave function
void hello();
typedef void F_hello();
void printHello() {
std::cout << "Hello world from master" << std::endl;
}
int main() {
printHello();
// now load dynamic library
void *soHandle = nullptr;
const char * const sharedLibPath = "libSlave.so";
// I tested different RTLD_xxx options, see text for explanations
soHandle = dlopen( sharedLibPath, RTLD_NOW | RTLD_DEEPBIND);
if (soHandle == nullptr)
return 1;
// now load shared lib function and execute it
F_hello * helloFn = reinterpret_cast<F_hello*>(dlsym( soHandle, "hello" ) );
helloFn();
return 0;
}
slave.h
#pragma once
#ifdef __cplusplus
extern "C" {
#endif
void hello();
#ifdef __cplusplus
}
#endif
slave.cpp
#include "slave.h"
#include <iostream>
void printHello() {
std::cout << "Hello world from slave" << std::endl;
}
void hello() {
printHello(); // should call our own hello() function
}
You notice the same function printHello()
exists both in the library and the master.
I compiled both manually this time (without CMake) and the following flags:
# build master
/usr/bin/c++ -fPIC -o tmp/master.o -c master.cpp
/usr/bin/c++ -rdynamic tmp/master.o -o Master -ldl
# build slave
/usr/bin/c++ -fPIC -o tmp/slave.o -c slave.cpp
/usr/bin/c++ -fPIC -shared -Wl,-soname,libSlave.so -o libSlave.so tmp/slave.o
Mind the use of -fPIC
in both master and slave-library.
I now tried several combinations of RTLD_xx flags and compile flags:
1.
dlopen() flags: RTLD_NOW | RTLD_DEEPBIND -fPIC for both libs
Hello world from master
Hello world from slave
-> result as expected (this is what I wanted to achieve)
2.
dlopen() flags: RTLD_NOW | RTLD_DEEPBIND -fPIC for only the library
Hello world from master
Speicherzugriffsfehler (Speicherabzug geschrieben) ./Master
-> Here, a segfault happens in the line where the iostream libraries cout
call is made; still, the printHello()
s function in the library is called
3.
dlopen() flags: RTLD_NOW -fPIC for only the library
Hello world from master
Hello world from master
-> This is my original behavior; so RTLD_DEEPBIND is definitely what I need, in conjunction with -fPIC in the master's binary;
Note: while CMake automatically adds -fPIC when building shared libraries, it does not generally do this for executables; here you need to manually add this flag when building with CMake
Note2: Using RTLD_NOW or RTLD_LAZY does not make a difference.
Using the combination of -fPIC on both executable and shared lib, with RTLD_DEEPBIND lets the original example with the different Dummy classes work without problems.