Search code examples
clinuxelfbacktrace

Linux: using backtrace(), /proc/self/maps and addr2line together results in invalid result


I'm trying to implement a way to record callstacks of my program into a file then display it later. Here are the steps:

  • Write the content of /proc/self/maps to a log file.
    • In this example, the content of /proc/self/maps is:
    • 00400000-05cdc000 r-xp 00000000 00:51 12974779926 helloworld
    • Which means the base address of helloworld program is 0x400000.
  • In the program, whenever an interesting code needs to have its callstack recorded, I use the function backtrace() to obtain the callstack's addresses then write to the log file. Let say the callstack in this example is:
    • 0x400001
    • 0x400003
  • At some point later, in a separate log viewer program, the log file is opened and parsed. An address in the callstack will be deducted by the base address of the program. In this case:
    • 0x400001 - 0x400000 = 1
  • I then use this deducted offset to obtain the line number using addr2line program:
    • addr2line -fCe hellowork 0x1
    • However this produces ??? result, i.e. invalid offset.
  • But if I don't deduct the callstack's address, but pass the actual value to add2line command:
    • addr2line -fCe hellowork 0x400001, then it returns correct file and line number.

The thing is if the address in within a shared object, then an absolute address won't work while a deducted offset will.

Why is there such a difference in the way the addresses are mapped for the main executable and the shared objects? Or maybe this is backtrace implementation specific, such that it always returns an absolute address for a function within the main executable?


Solution

  • Why is there such a difference in the way the addresses are mapped for the main executable and the shared objects?

    The shared libraries are usually linked at address 0 and relocated. The non-position executable is usually linked at address 0x400000 on x86_64 Linux and must not be relocated (or it wouldn't work).

    To find out where a given ELF binary is linked, look at the p_vaddr address of the fist PT_LOAD segment (readelf -Wl foo will show you that). In addition, only ET_DYN ELF binaries can be relocated, while ET_EXEC binaries must not be.

    Note that position-independent executables exist, and for them you need to do the subtraction.

    Note that shared libraries are usually linked at address 0 (and so subtraction works), but they don't have to. Running prelink on a shared library will result in a shared library linked at non-0 address, and then the subtraction you use will not work either.

    Really, what you need to do is subtract at-runtime load address from linked-at address to get relocation (which would be 0 for non-PIE executables, and non-0 for shared libraries), and then subtract that relocation from the program counter recorded by backtrace to get the symbol value.

    Finally, if you iterate over all loaded ELF images with dl_iterate_phdr, the dlpi_addr it provides is exactly the relocation that you need to subtract.