Search code examples
ubuntugdbcentos7glibcdlopen

Why did I get an invalid handle (not zero) when I called the dlopen() function on CentOS?


I attempted to install a seccomp BPF filter for a running Tomcat process. After attaching gdb to the process, I invoked the dlopen function to load a shared library (.so file), which returned a handle.The handle was a integer, not zero. However, when I used the gdb x command to inspect the memory contents of the handle, the gdb prompt showed an error: Cannot access memory at address. Subsequently, I tried calling dlsym using the handle as a parameter, but gdb was terminated because a SIGSEGV signal occurred. Here is the illustration:

(gdb) set $handle=dlopen("/opt/seccompfilter.so",1)
(gdb) x $handle
0xffffffffaaef8730:     Cannot access memory at address 0xffffffffaaef8730
(gdb) call dlsym((void *)$handle, "install_filter")
[Thread 0x7f6250aed700 (LWP 2069) exited]
Program received signal SIGSEGV, Segmentation fault.
_dl_lookup_symbol_x (undef_name=0x5636aaef81d0 "install_filter", 
undef_map=0xffffffffaaef8730, ref=0x7ffe774956a0, 
symbol_scope=0xffffffffaaef8ab8, version=0x0, type_class=0,
flags=2, skip_map=0x0) at dl-lookup.c:733
733         while ((*scope)->r_list[i] != skip_map)
The program being debugged was signaled while in a function called 
from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on".
Evaluation of the expression containing the function
(__dlsym) will be abandoned.
When the function is done executing, GDB will silently stop.

I'm trying to understand what went wrong. Here is the environment I was working with:

CentOS 7 kernel-3.10.0-1160.90.1.el7.x86_64
gdb 7.6.1-120.el7
glibc 2.17

Interestingly, when I performed the same steps on Ubuntu 20.04, everything worked successfully. What's the problem?

I have tried calling the dlopen function to load the libc.so.6 shared library, and it got the same result.


Solution

  • I'm trying to understand what went wrong.

    The $handle value 0xffffffffaaef8730 is wrong. On x86_64 user-space memory addresses (and the handle returned by dlopen actually points to struct link_map) are in the range [0x1000 - 0x7fffffffffff], and anything in the 0xffffffff........ range can only be a kernel address.

    Now, why did this happen?

    Most likely GDB mis-interpreted the return type of dlopen in the $rax as int32_t, and sign-extended it to int64_t.

    A dlopen return might look something like 0x00007ffffaaef8730, and if you do the mis-interpretation above, you would get exactly the value you actually got:

    (gdb) p/x (long)(int)0x00007ffffaaef8730
    $1 = 0xffffffffaaef8730
    

    The ancient version of GDB you are using may have a bug in it. The other possibility -- that GDB doesn't know the actual dlopen return type -- seems unlikely given that it does have debug info for GLIBC.

    You may wish to set a breakpoint on dlopen, use finish to return from it, and examine the $rax directly.