Search code examples
linkershared-librariesip-addresstraceperf

The Mechanism Used to Determine Library Load Address in Perf


How does perf determine the load addresses for each loaded image (e.g., shared libraries) during post-processing. For example, perf report uses this information to make each symbol address relative to the beginning of each loaded image. This is shown in the image below (unwind: _int_malloc...): enter image description here

Is it stored somewhere in the elf binary or profiling output (i.e., perf.data)?


Solution

  • Shared libraries load address are stored inside the perf.data file recorded during perf record command. You can use perf script -D command to dump the data from perf.data in partially decoded format. When your program is loaded by ld-linux*.so.2 (or when required with dlopen), loader will search for library and load its segments using mmap syscall. These mmap events are recorded by kernel and have type PERF_RECORD_MMAP or PERF_RECORD_MMAP2 in perf.data file. And perf report (and perf script) will reconstruct memory offsets to decode symbol names.

    $ perf record  echo 1
    $ perf script -D|grep MMAP -c
    7
    $ perf script -D|less
    PERF_RECORD_MMAP2 ... r-xp /bin/echo
    ...
    PERF_RECORD_MMAP2 ... r-xp /lib/x86_64-linux-gnu/libc-2.27.so
    

    Basic ideas of perf are described in https://github.com/torvalds/linux/blob/master/tools/perf/design.txt file. To start profiling there is perf_event_open syscall which has perf_event_attr *attr argument. Man page describes mmap-related fields of attr:

       The perf_event_attr structure provides detailed configuration
       information for the event being created.
    
                     mmap           : 1,   /* include mmap data */
                     mmap_data      : 1,   /* non-exec mmap data */
                     mmap2          :  1,  /* include mmap with inode data */
    

    Linux kernel in its perf_events subsystem (kernel/events) will record required events for profiled processes and export the data with fd and mmap to the profiler. perf record usually dumps this data from kernel into perf.data file without heavy processing (check "Woken up 1 times to write data" prints of your perf record output). Mmap events in kernel are recorded by perf_event_mmap_output called from perf_event_mmap_event which is called from perf_event_mmap. mmap syscall implementation in mm/mmap.c has some unconditional calls to perf_event_mmap.

    perf's design.txt mentions munmap, but current implementation has no munmap field or event, event code 2 was reused to PERF_RECORD_LOST. There were ideas that munmap can be helpful https://www.spinics.net/lists/netdev/msg524414.html with links to https://lkml.org/lkml/2016/12/10/1 and https://lkml.org/lkml/2017/1/27/452

    perf tool is part of linux kernel sources and can be viewed online with LXR/elixir website: https://elixir.bootlin.com/linux/v5.4/source/tools/perf/ Processing code for mmap/mmap2 events is in perf/util/machine.c machine__process_mmap_event and machine__process_mmap2_event; logged mmap arguments, returned address, offset and file name are recorded with help of map__new and thread__insert_map for the process (pid/tid) and used later to convert sample event address into symbol name.

    PS: Your perf.data has size of 300+ MB, this is huge and processing can be slow. For long running programs you may want to lower perf record event sampling frequency with -F freq option of perf record: perf record -F40 or with -c option.