c linux gcc shared-libraries dynamic-library

Access dynamic symbol with hidden visibility attribute using `dlsym`

I have a dynamically linked library that defines __attribute__((visibility("hidden"))) symbol which I need to access. Here is a simplified code

shared.c

__attribute__((visibility("hidden"))) int hidden_sym[12];
int                                       visible_sym[12];

shared_user.c

#include <dlfcn.h>
#include <stdio.h>

int main() {
    void* dlopen_res = dlopen("./libnogcc.so", RTLD_LAZY);
    if (dlopen_res == NULL) {
        printf("dlopen_res is NULL: %s\n", dlerror());
        return 1;
    }

    if (dlsym(dlopen_res, "visible_sym") == NULL) {
        printf("bb_so is NULL: %s\n", dlerror());
        return 1;
    } else {
        printf("'visible_sym' open ok\n");
    }


    if (dlsym(dlopen_res, "hidden_sym") == NULL) {
        printf("bb_so is NULL: %s\n", dlerror());
        return 1;
    }
}

compilation and execution

gcc shared.c -fpic -shared -olibnogcc.so
gcc -ldl shared_user.c -o shared_main
./shared_main

It correctly loads visible_symbol, but expectedly fails to resolve the hidden symbol:

'visible_sym' open ok
bb_so is NULL: ./libnogcc.so: undefined symbol: hidden_sym

I want to know if there is any workaround that would allow me to get access to the hidden symbol.

Note that it is not required to be a dlsym-based solution. Anything that would give me access to the hidden symbol, without modification of the library symbol table, would be considered an accepted solution.

My real-world use case is quite similar - I want to get access to the profiling information that is generated by gprof in instrumented code. I'm still not exactly sure, but it seems like it is stored in the __bb_head variable that is declared as struct __bb *__bb_head __attribute__((visibility("hidden")));. Structure definition is accessible using <sys/gmon.h> and <sys/gmon_out.h> headers, but I wasn't able to find any way to actually get profiling data in raw form. I am aware that gprof allows me to dump information when the program has finished executing, but I need to get this data at runtime, without having to force file writing and then re-reading it back.

code for accessing libc data

#include <dlfcn.h>
#include <stdio.h>
#include <sys/gmon.h>
#include <sys/gmon_out.h>

int main() {
    void* dlopen_res = dlopen("libc.so.6", RTLD_LAZY);
    if (dlopen_res == NULL) {
        printf("dlopen_res is NULL: %s\n", dlerror());
        return 1;
    }

    void* bb_so = dlsym(dlopen_res, "__bb_head");
    if (bb_so == NULL) {
        printf("bb_so is NULL: %s\n", dlerror());
        return 1;
    }
}

Solution

I just did a test on your simple .so.

The hidden_sym does show up in the .symtab.

If you do readelf -a, you'll be able to see it:

     6: 0000000000004040    48 OBJECT  GLOBAL DEFAULT   21 visible_sym
    39: 0000000000004080    48 OBJECT  LOCAL  DEFAULT   21 hidden_sym
    47: 0000000000004040    48 OBJECT  GLOBAL DEFAULT   21 visible_sym

dlsym may not be able to find it, but you could parse readelf output or use libelf to get the symbol.

You could use /proc/self/maps to find the load address of the library and then apply the offset.

Or, you could make a copy of the .so file. Then, change the binding from LOCAL to GLOBAL by editing your copy. There may be existing tools to let you do this.

Then, dlsym would be able to find it.

UPDATE:

I sure do hope that is not the only way to get around this, but I guess "hidden means hidden" in this case, and there is no acceptable workaround. – haxscramper

AFAICT, hidden means that the symbol is global so that the various .o files that link up to form the .so file can access the symbol. Then, when the .so is linked, the symbol binding is changed so that it was as if it had static on it.

I'm accepting this answer because it technically answers my question, and after a whole day of trying to find a workaround, I think that's as good of a solution I can get, even though the solution itself basically means "there is no sane solution". – haxscramper

Even if the symbol were global, you may not be able to access/utilize it safely. That's because what you're trying to do is dump the data while it is being accumulated. That may be UB because it [probably] doesn't lock/freeze the data.

In particular, fielding a signal periodically to get a function histogram [which gprof does internally] is [usually] asynchronous to the running code.

You'll have to test it to confirm that you can access the data safely.

You might be able to start a trace, wait a bit, stop the trace [using the proscribed method], dump the data. And, repeat the process. That's not what you said you wanted, but it may be enough of a compromise, based on your stated limitations (i.e. no rebuild of gprof library, etc.).

I guess it depends upon what perf data you want and how much effort you're willing to spend to instrument the target code to get it.

It might not be worth it for your current use case, but if you're going to do performance analysis on other projects, you could reuse a custom methodology you develop in the future (i.e.) it becomes part of your personal programming "bag of tricks".

When I need performance data, I usually roll my own. I maintain a ring queue of "event" structs. An event can be anything (e.g.) enter_func, exit_func, func_is_at_line_X, etc.

I record timestamps in each event entry, so I can see exact time spent in each function at a given time. I can see latency as well [which gprof won't provide], particularly for multithread applications.

That is, thread A reaches point X at time T1. It enqueues data to thread B and thread A loops and waits for more data. Thread B wakes up at time T2, dequeues the data, processes it, and goes back to sleep at time T3. Thread A wakes up at time T4.

Now, if T2 - T1 is "excessive", then we want to know why. Was thread B still processing a prior request. Or, is it delayed because of heavy system loading? Is T4 - T2 less than time T4 - T1 [which is what we hope for]?

I make this event queue thread safe [which, AFAICT, gprof isn't].

I instrument the code in a manner similar to what dtrace does.

I leave the instrumentation calls in place, enabled by a global master flag [or vector of per-event type flags]. Then, I can turn them on a remote system (e.g. running at a customer site), to collect data in those cases where the performance "issue" [whatever it may be] only shows up on one system in a configuration that only exists at the customer's system. That is, despite best efforts, the issue is not reproducible on any lab/test system I have access to.

YMMV ...