Search code examples
c++shared-librariesld-preloadextern-c

Can I write a library to preload in C++? Is there anything I need to do other than prepend `extern "C"` to the functions to intercept?


I am working on a personal project where I need to intercept Linux APIs like open(), read() etc, and I would like to do some data analysis on them. I would need to keep a C++ data structure std::map in the library that will be updated in a thread-safe manner.

Since I want to use C++'s std::map, I was thinking of writing the library (.so) to preload using C++, and the functions to intercept will be prepended with extern "C" to prevent name-mangling by the C++ compiler.

But, is this the correct approach? Should I write the shared library purely in C, and implement the map data structure from scratch?


Solution

  • is this the correct approach?

    There is nothing particularly wrong with this approach, but it can get complicated (see below).

    Should I write the shared library purely in C, and implement the map data structure from scratch?

    That would get rid of many potential complications.


    So what are the complications?

    In order to use std::map, you will have to link your preload .so against libstdc++.so.6, which will try to initialize itself before your library does.

    And libstdc++ is large and complicated. I would not be surprised if its initialization calls into open(), read(), etc.

    Which means that your library must be prepared to handle these calls before libstdc++.so can be expected to work.

    The following sequence may happen and cause your library to crash:

    • libstdc++.so calls open, which is interposed by your library
    • your interposer tries to put something into an std::map instance, which calls into libstdc++.so again,
    • which crashes because it doesn't expect its own call to open to get back into libstdc++.

    Many variations of above are possible, and which variation you get may depend on the exact set of functions you interpose, the set of calls you make into libstdc++, and the version of libstdc++.

    That is, it may work fine until you try it on a different system, or until you update your g++, or until you add new interposer. Or it may not work at all.

    It is certainly not designed to work, so you'll be playing Russian roulette.

    P.S. On my system, the following program:

    int main() { return 0; }
    

    built with g++ main.cc -Wl,--no-as-needed and run with LD_DEBUG=bindings ./a.out shows that my libstdc++.so calls into the following libc.so.6 functions:

       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__libc_single_threaded' [GLIBC_2.32]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__cxa_finalize' [GLIBC_2.2.5]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `stdin' [GLIBC_2.2.5]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `stderr' [GLIBC_2.2.5]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `stdout' [GLIBC_2.2.5]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `pthread_once' [GLIBC_2.34]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__newlocale' [GLIBC_2.2.5]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__uselocale' [GLIBC_2.3]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `wctob' [GLIBC_2.2.5]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `btowc' [GLIBC_2.2.5]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__wctype_l' [GLIBC_2.2.5]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__cxa_atexit' [GLIBC_2.2.5]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `secure_getenv' [GLIBC_2.17]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `malloc' [GLIBC_2.2.5]
       1926652:     binding file /lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `fflush' [GLIBC_2.2.5]
    

    As you can see, open and read are not called, but malloc and fflush are, so interposing these may cause trouble.