Search code examples
c++android-ndkjava-native-interfacerttidlopen

Is it possible to merge weak symbols like vtables/typeinfo across RTLD_LOCAL'ly loaded libraries?


For context: I have a Java project that is partially implemented with two JNI libraries. For the sake of example, libbar.so depends on libfoo.so. If these were system libraries,

System.loadLibrary("bar");

would do the trick. But since they're custom libraries I'm shipping with my JAR, I have to do something like

System.load("/path/to/libfoo.so");
System.load("/path/to/libbar.so");

libfoo needs to go first because otherwise libbar can't find it, as it's not in the system library search path.

This has been working well for a while, but I've now run into an issue where std::any_cast is throwing std::bad_any_cast despite the types being correct. I tracked it down to the fact that both libraries have a different definition of the typeinfo for that type, and they're not being merged at runtime. This seems to be because System.load() ends up invoking dlopen() with RTLD_LOCAL rather than RTLD_GLOBAL.

I wrote this to demonstrate the behaviour without needing JNI:

foo.hpp

class foo { };

extern "C" const void* libfoo_foo_typeinfo();

foo.cpp

#include "foo.hpp"
#include <typeinfo>

extern "C" const void* libfoo_foo_typeinfo()
{
    return &typeid(foo);
}

bar.cpp

#include "foo.hpp"
#include <typeinfo>

extern "C" const void* libbar_foo_typeinfo()
{
    return &typeid(foo);
}

main.cpp

#include <iostream>
#include <typeinfo>
#include <dlfcn.h>

int main() {
    void* libfoo = dlopen("./libfoo.so", RTLD_NOW | RTLD_LOCAL);
    void* libbar = dlopen("./libbar.so", RTLD_NOW | RTLD_LOCAL);

    auto libfoo_fn = reinterpret_cast<const void* (*)()>(
        dlsym(libfoo, "libfoo_foo_typeinfo"));
    auto libbar_fn = reinterpret_cast<const void* (*)()>(
        dlsym(libbar, "libbar_foo_typeinfo"));

    auto libfoo_ti = static_cast<const std::type_info*>(libfoo_fn());
    auto libbar_ti = static_cast<const std::type_info*>(libbar_fn());

    std::cout << std::boolalpha
              << (libfoo_ti == libbar_ti) << "\n"
              << (*libfoo_ti == *libbar_ti) << "\n";
    return 0;
}

Makefile

all: libfoo.so libbar.so main

libfoo.so: foo.cpp
        $(CXX) -fpic -shared -Wl,-soname=$@ $^ -o $@

libbar.so: bar.cpp
        $(CXX) -fpic -shared -Wl,-soname=$@ $^ -L. -lfoo -o $@

main: main.cpp
        $(CXX) $^ -ldl -o $@

On my system, I get

$ make
...
$ ./main
false
true

This is because even though the typeinfo addresses are different, GCC's libstdc++ uses the mangled names for equality. On LLVM's libc++, for example, equality is based on the typeinfo address itself, so I get:

$ make CXX="clang++ -stdlib=libc++"
$ ./main
false
false

If I pass RTLD_GLOBAL instead, I see

true
true

And if I edit main.cpp to load libbar.so first, it also works, provided I tell it where it can find libfoo.so:

$ LD_LIBRARY_PATH=. ./main
true
true

But for the reasons described at the top of this post, neither of these is a practical workaround.

This is very similar to https://github.com/android-ndk/ndk/issues/533 but with non-dynamic types, so there's no way to add a "key function" to force the typeinfo to be a strong symbol. I happened to reproduce the problem on Android first, but it isn't Android-specific.


Solution

  • No, that is not possible. RTLD_LOCAL seeks to prevent exactly that, and unfortunately must be used for System.loadLibrary since otherwise bad things will happen if you System.loadLibrary two libraries that each define different foo classes.