parsing abstract-syntax-tree debug-symbols llvm-ir

Parsing LLVM IR code (with debug symbols) to map it back to the original source

I'm thinking about building a tool to help me visualise the generated LLVM-IR code for each instruction/function on my original source file. Something like this but for LLVM-IR.

The steps to build such tool so far seem to be:

Start by with LLVM-IR AST builder.
Parse generated IR code.
On caret position get AST element.
Read the element scope, line, column and file and signal it on the original source file.

Is this the correct way to approach it? Am I trivialising it too much?

Solution

I think your approach is quite correct. The UI part will probably be quite long to implement so I'll focus on the llvm part.

Let's say you start from a input file containing your LLVM-IR.

Step 1 process module:
Read file content to a string. Then Build a module from it, and process it to get the debug info:

llvm::MemoryBuffer* buf = llvm::MemoryBuffer::getMemBuffer(llvm::StringRef(fileContent)).release();
llvm::SMDiagnostic diag;
llvm::Module* module = llvm::parseIR(buf->getMemBufferRef(), diag, *context).release();
llvm::DebugInfoFinder* dif = new llvm::DebugInfoFinder();
dif->processModule(*module);

Step 2 iterate on instructions:
Once done with that, you can simply iterate on function and blocks and instructions:

// pseudo code for loops (real code is a bit long)
foreach(llvm::Function f in module.functions) 
{
   foreach(llvm::BasicBlock b in f.BasicBlockList)
   {
      foreach(llvm::Instruction inst in b.InstList) 
      {
         llvm::DebugLoc dl = inst.getDebugLoc();
         unsigned line = dl->getLine();
         // accordingly populate some dictionary between your instructions and source code
      }
   }
}

Step 3 update your UI
This is another story...