Search code examples
shared-librarieselfdynamic-linkingdynamic-loading

Calculating runtime address of a symbol from shared object (ELF)


I have an ELF application which I'm debugging using GDB. One of the things that GDB does is to set a breakpoint at the symbol _dl_debug_state located in the dynamic loader.

$ readelf -sW /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2 | grep _dl_debug_state
    24: 000000000000f590     1 FUNC    GLOBAL DEFAULT   12 _dl_debug_state@@GLIBC_PRIVATE
   379: 000000000000f590     1 FUNC    LOCAL  DEFAULT   12 __GI__dl_debug_state
   460: 000000000000f590     1 FUNC    GLOBAL DEFAULT   12 _dl_debug_state

It looks like the address of _dl_debug_state within the shared object is 0x000000000000f590. However, for my particular application, GDB determines that the address is 0x00007ffff7fe3590.

(gdb) info symbol 0x00007ffff7fe3590
_dl_debug_state in section .text of target:/nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2

I'm struggling to work out how GDB has come to this conclusion. I had a look at the PT_LOAD headers of the dynamic loader:

$ readelf -lW /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2

Elf file type is DYN (Shared object file)
Entry point 0x1090
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x000f20 0x000f20 R   0x1000
  LOAD           0x001000 0x0000000000001000 0x0000000000001000 0x01db90 0x01db90 R E 0x1000
  LOAD           0x01f000 0x000000000001f000 0x000000000001f000 0x007774 0x007774 R   0x1000
  LOAD           0x027540 0x0000000000028540 0x0000000000028540 0x001a50 0x001bf0 RW  0x1000
  DYNAMIC        0x027e10 0x0000000000028e10 0x0000000000028e10 0x000190 0x000190 RW  0x8
  NOTE           0x000238 0x0000000000000238 0x0000000000000238 0x000024 0x000024 R   0x4
  GNU_EH_FRAME   0x023960 0x0000000000023960 0x0000000000023960 0x0006ec 0x0006ec R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x027540 0x0000000000028540 0x0000000000028540 0x000ac0 0x000ac0 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_d .rela.dyn .rela.plt
   01     .plt .plt.got .text
   02     .rodata .eh_frame_hdr .eh_frame
   03     .data.rel.ro .dynamic .got .data .bss
   04     .dynamic
   05     .note.gnu.build-id
   06     .eh_frame_hdr
   07
   08     .data.rel.ro .dynamic .got

Also had a look at this:

$ ldd application
        /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2 (0x7fbe385a3000)
        libdl.so.2 => /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2 (0x7fbe385a3000)
        librt.so.1 => /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2 (0x7fbe385a3000)
        libpthread.so.0 => /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2 (0x7fbe385a3000)
        libstdc++.so.6 => /nix/store/a6z7ighixg7gb6krf9k60ylgmahij63x-gcc-9.3.0-lib/lib/libstdc++.so.6 (0x7fbe383c2000)
        libm.so.6 => /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2 (0x7fbe385a3000)
        libgcc_s.so.1 => /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/libgcc_s.so.1 (0x7fbe383a8000)
        libc.so.6 => /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2 (0x7fbe385a3000)
        ld-linux-x86-64.so.2 => /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-linux-x86-64.so.2 (0x7fbe3837d000)

How can I reliably calculate the runtime address? For context, I'm working with the GDB RSP protocol and trying to intercept certain packets.


Solution

  • The address of the symbol I'm after is 0x00000000f590. The shared object is loaded as four individual blocks as suggested by the number of LOAD blocks (when you do readelf -lW) and this particular address should be in the second block.

    I can print the runtime maps of my executable:

    $ cat /proc/<PID>/maps
    00400000-00407000 r--p 00000000 00:36 86                                 /run/user/1000/<...>
    00407000-006c3000 r-xp 00007000 00:36 86                                 /run/user/1000/<...>
    006c3000-007bb000 r--p 002c3000 00:36 86                                 /run/user/1000/<...>
    007bc000-007e3000 rw-p 003bb000 00:36 86                                 /run/user/1000/<...>
    007e3000-007e4000 rw-p 00000000 00:00 0                                  [heap]
    7ffff7fd0000-7ffff7fd3000 r--p 00000000 00:00 0                          [vvar]
    7ffff7fd3000-7ffff7fd4000 r-xp 00000000 00:00 0                          [vdso]
    7ffff7fd4000-7ffff7fd5000 r--p 00000000 103:06 12589022                  /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-2.30.so
    7ffff7fd5000-7ffff7ff3000 r-xp 00001000 103:06 12589022                  /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-2.30.so
    7ffff7ff3000-7ffff7ffb000 r--p 0001f000 103:06 12589022                  /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-2.30.so
    7ffff7ffc000-7ffff7ffe000 rw-p 00027000 103:06 12589022                  /nix/store/jx19wa4xlh9n4324xdl9rjnykd19mmq3-glibc-2.30/lib/ld-2.30.so
    7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0
    7ffffffcf000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
    ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0                  [vsyscall]
    

    As can be seen, the second block is loaded to the address 0x7ffff7fd5000. This block's offset is 0x1000 meaning we can calculate the address that we're after: 0x7ffff7fd5000 + (0x00000000f590 - 0x000000001000) = 0x7ffff7fe3590

    It does look the the maths can be a bit more complicated at times e.g. one may need to take alignment into account.