Search code examples
llvmx86-64llvm-irllvm-c++-api

Correlating LLVM IR analysis with final address


I'm trying to design a LLVM IR pass that gets some information from the LLVM IR (specifically: types used in a IR call instruction) and somehow correlate this IR-level analysis with binary-level addresses. For example, I want to know that a call instruction (at a certain address in the final binary) is calling a function with a certain type signature.

Some observations:

  1. The obvious problem is that the final addresses are not available yet when the IR pass runs.
  2. While IR instruction do not map 1:1 to machine instructions, it should be relatively safe to assume that a call in IR will map to a call in machine code.
  3. One could just disassemble the binary, look at the function being called, and get its type. However, this does not work for indirect call instructions (which is why I'm trying to do this in IR).

In this comment, the suggested approach to a similar problem is to "inject[] some metadata that you can spot later in the executable". However, I couldn't find any information about how to make metadata survive in the binary.


Solution

  • You might tie the IR calls to the final calls using the debug location (which is a kind of metadata). If you make sure there is a file name, line and column for each call in IR, this hack should be possible. Cleaner solutions exist, I am sure.