Search code examples
parsingabstract-syntax-treedebug-symbolsllvm-ir

Parsing LLVM IR code (with debug symbols) to map it back to the original source


I'm thinking about building a tool to help me visualise the generated LLVM-IR code for each instruction/function on my original source file. Something like this but for LLVM-IR.

The steps to build such tool so far seem to be:

  • Start by with LLVM-IR AST builder.
  • Parse generated IR code.
  • On caret position get AST element.
  • Read the element scope, line, column and file and signal it on the original source file.

Is this the correct way to approach it? Am I trivialising it too much?


Solution

  • I think your approach is quite correct. The UI part will probably be quite long to implement so I'll focus on the llvm part.

    Let's say you start from a input file containing your LLVM-IR.

    Step 1 process module:
    Read file content to a string. Then Build a module from it, and process it to get the debug info:

    llvm::MemoryBuffer* buf = llvm::MemoryBuffer::getMemBuffer(llvm::StringRef(fileContent)).release();
    llvm::SMDiagnostic diag;
    llvm::Module* module = llvm::parseIR(buf->getMemBufferRef(), diag, *context).release();
    llvm::DebugInfoFinder* dif = new llvm::DebugInfoFinder();
    dif->processModule(*module);
    

    Step 2 iterate on instructions:
    Once done with that, you can simply iterate on function and blocks and instructions:

    // pseudo code for loops (real code is a bit long)
    foreach(llvm::Function f in module.functions) 
    {
       foreach(llvm::BasicBlock b in f.BasicBlockList)
       {
          foreach(llvm::Instruction inst in b.InstList) 
          {
             llvm::DebugLoc dl = inst.getDebugLoc();
             unsigned line = dl->getLine();
             // accordingly populate some dictionary between your instructions and source code
          }
       }
    }
    

    Step 3 update your UI
    This is another story...