Search code examples
cfunctiongdbarmglibc

Getting wrong glibc function address on ARM


I want to get the address of a function. Using function names gets the correct addresses on x86, both for local functions and glibc functions.

But on ARM, local function addresses are correct, while glibc function addresses are wrong.

Here's my simple program:

#include <stdio.h>
int sum(int a, int b)
{
    return a + b;
}
int main(int argc, char *argv[])
{
    char buffer[32] = { '\0' };
    sprintf(buffer, "cat /proc/%d/maps", getpid());
    printf("sum = %p\n", sum);
    printf("fopen = %p\n", fopen);
    system(buffer);
    return 0;
}

# x-compile it to an ARM executable:
$ arm-linux-gnueabihf-4.9.1-gcc -g -o misc misc.c

# debug on ARM
/home # ./gdb ./misc
GNU gdb (GDB) 7.5.1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/misc...done.
(gdb) b 16
Breakpoint 1 at 0x8534: file misc.c, line 16.
(gdb) r
Starting program: /home/misc 
sum = 0x8491
fopen = 0x835c
00008000-00009000 r-xp 00000000 00:13 1703976    /home/misc
00010000-00011000 rw-p 00000000 00:13 1703976    /home/misc
76ed9000-76fd0000 r-xp 00000000 1f:08 217        /lib/libc-2.19-2014.06.so
76fd0000-76fd7000 ---p 000f7000 1f:08 217        /lib/libc-2.19-2014.06.so
76fd7000-76fd9000 r--p 000f6000 1f:08 217        /lib/libc-2.19-2014.06.so
76fd9000-76fda000 rw-p 000f8000 1f:08 217        /lib/libc-2.19-2014.06.so
76fda000-76fdd000 rw-p 00000000 00:00 0 
76fdd000-76ff7000 r-xp 00000000 1f:08 199        /lib/ld-2.19-2014.06.so
76ffb000-76ffe000 rw-p 00000000 00:00 0 
76ffe000-76fff000 r--p 00019000 1f:08 199        /lib/ld-2.19-2014.06.so
76fff000-77000000 rw-p 0001a000 1f:08 199        /lib/ld-2.19-2014.06.so
7efdf000-7f000000 rw-p 00000000 00:00 0          [stack]
ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

Breakpoint 1, main (argc=1, argv=0x7efffe64) at misc.c:16
16      misc.c: No such file or directory.
(gdb) p fopen
$1 = {<text variable, no debug info>} 0x76f26a50 <fopen>
(gdb) 

Note glibc text segment is mapped to address 76ed9000, so how could fopen be at a wired address like 0x835c?

However, the next line, (gdb) p fopen, gdb gives the correct address.


Solution

  • There is no guarantee that the value of a pointer actually gives you the in-memory address of the thing you're looking for. For function pointers you are actually more likely than not to have completely different values.

    Below I just completely overdid it, but it would be a shame to delete this explanation. So here's a short version: function pointers only guarantee that comparison with another function pointer will compare equal, this get complicated fast when shared libraries are involved.

    What's going on here has to do with dynamic linking. When you're linking your program the linker doesn't know where libc will be located in memory, this can only be resolved by the dynamic linker at runtime. A naive way to do this would be to just rewrite the addresses of functions in the code of your program, but this is inefficient since it would mean that every execution of a program would be unable to share the memory of the executable with other runs. Instead something called PLT exists. When you do a function call to a dynamically linked function the actual code that runs is a jump to a local function in your program which then loads the actual address of the function from a table and jumps to that (this is very different on different architectures, but this is the general idea).

    I built your program on amd64, let's see it in action:

    (gdb) break system
    Breakpoint 1 at 0x400510
    (gdb) run
    Starting program: /home/art/./foo
    sum = 0x400660
    fopen = 0x400550
    [...]
    

    So as you can see on amd64 the value of fopen is also suspicious. Let's see what code lurks at that address:

    (gdb) x/i 0x400550
       0x400550 <fopen@plt>:    jmpq   *0x200aea(%rip)        # 0x601040 <[email protected]>
    

    First thing we can notice is that gdb already knows that this isn't actually fopen but this particular spot in memory is called fopen@plt. And it's just one instruction: a jump to the value of the pointer that is at instruction pointer plus 0x200aea (linux/amd64 does almost all addressing relative to the instruction pointer) which gdb nicely tells us is the address 0x601040 and happens to be named [email protected]. GOT stands for Global Offset Table and PLT stands for Procedure Linkage Table.

    Let's go down the rabbit hole:

    (gdb) x/g 0x601040
    0x601040 <[email protected]>:   0x0000000000400556
    (gdb) x/i 0x0000000000400556
       0x400556 <fopen@plt+6>:  pushq  $0x5
    (gdb)
       0x40055b <fopen@plt+11>: jmpq   0x4004f0
    (gdb) x/i 0x4004f0
       0x4004f0:    pushq  0x200b12(%rip)        # 0x601008
    (gdb)
       0x4004f6:    jmpq   *0x200b14(%rip)        # 0x601010
    (gdb) x/g 0x601010
    0x601010:   0x00007ffff7df0290
    (gdb) x/i 0x00007ffff7df0290
       0x7ffff7df0290 <_dl_runtime_resolve>:    sub    $0x78,%rsp
    (gdb)
    

    Something weird happens here. The address in [email protected] is just back to one instruction after fopen@plt, which then pushes something on the stack and jumps to some other code that pushes more to the stack and jumps to get another weird address from a table which ends us up at _dl_runtime_resolve. What's going on is lazy binding. The developers of the dynamic linker figured out that most of the linking information that dynamic libraries and programs contain will never be used. When you run your program that calls two function from libc you don't want to resolve all the thousands and thousands of dynamic function calls that libc does internally, it's a waste of time. Also, we value quick startup over quick runtime for most programs. So by default all your functions aren't actually resolved. They get resolved at runtime, first time you call them. That's what _dl_runtime_resolve does. The pushes to the stack are most likely non-standard way of passing arguments to that function because this code is not allowed to use any registers (the calling code thinks it just called fopen normally).

    But wait a minute. The C standard says that two function pointers should compare equal if they point to the same function. How does that work if one of the pointers could be from your program and another one comes from a dynamic library? Well, this is very much architecture dependent, but after some digging I found that on my architecture even when a library returns a function pointer, that function pointer gets translated to the PLT function in my main program. Why? Don't know. Someone made the decision to implement it that way at some point.