Search code examples
c++compiler-constructionllvmllvm-ir

LLVM IR: Get AllocaInst from CreateLoad


I'm using the LLVM IR C++ API to generate IR for my compiler. My question boils down to:

From a CreateLoad instruction, can I get the AllocaInst* it was loaded from so I can store the result of arithmetic instructions in that AllocaInst* without needing to retrieve it from a namedValues table?

Background

My semantic analyzer and IR generator both implement the visitor pattern where visitor method is accept. Below, the calls to accept are for the IR generator and translate to a call to llvm::Value* ASTCodegenner::codegen(<subclass of AST>).

I've successfully implemented unary instructions so that my compiler can compile things like:

int a = 1;
int b = 3 + ++a; // b = 5, a = 2

Which translates roughly to (modified for brevity):

%a = alloca i32
%b = alloca i32
store i32 1, i32* %a                // store 1 in %a
%a1 = load i32, i32* %a             // load value from %a
%inctmp = add nsw i32 %a1, 1        // add 1 (unary increment, a + 1)
store i32 %inctmp, i32* %a          // store in %a (a = a + 1)
%addtmp = add nsw i32 3, %inctmp    // use incremented value (prefix unary operator, ++a)
store i32 %addtmp, i32* %b          // store result of 3 + ++a in %b

The above is also equivalent to clang's IR representation of the same code in C.

Problem

Unary expressions are parsed into a UnaryExprAST which receives an operand property of AST (base class for all AST nodes). My reasoning for this is statements like ++1 should be valid in syntactic analysis but not semantic analysis (UnaryExprAST.operand should be able to store VariableAST, NumberAST, etc.).

The solution I have now is an ugly one involving a dynamic_cast from AST up to VariableAST so I can retrieve its AllocaInst* from the namedValues table. Hence my curiosity if there was a way to retrieve

llvm::Value* ASTCodegenner::codegen(UnaryExprAST* ast) 
{
    // codegen operand. if it's a VariableAST, this returns a load instruction
    // (see below for acutal method)
    llvm::Value* target = ast->operand->accept(*this);

    // retrieve AllocaInst* from namedValues table
    std::unique_ptr<VariableAST> operand = std::unique_ptr<VariableAST>(dynamic_cast<VariableAST*>(ast->operand->clone()));
    llvm::AllocaInst* targetAlloca = namedValues[operand->id];
    
    // this method just returns the result of the unary operation, e.g. 
    // target+1 or target-1, depending on the unary operator
    llvm::Value* res = applyUnaryOperation(target, ast->op);
    // store incremented value
    builder->CreateStore(res, targetAlloca);
    // if prefix unary, return inc/dec value; otherwise, return original value 
    // before inc/dec
    return ast->isPrefix() ? res : target;
}
llvm::Value* ASTCodegenner::codegen(VariableAST* ast) 
{
    llvm::AllocaInst* val = namedValues[ast->id];
    return builder->CreateLoad(val->getAllocatedType(), val, ast->id);
}

I thought about builder->CreateStore(res, target); instead of builder->CreateStore(res, targetAlloca); but that would violate SSA as target is assigned the load operation.

Possible Solution: #1

A VariableAST has a ctx property which is a member of an enum:

enum class VarCtx
{
    eReference, // referencing a variable (3 * a * 20)
    eStore, // storing new value in a variable (a = ...)
    eAlloc, // allocating a vairable (int a = ...) 
    eParam, // function parameter (func add(int a, int b))
};

During my semantic analysis phase (or even the constructor of UnaryExprAST), I could dynamic_cast the UnaryExprAST.operand to VariableAST, check for null, and then fill the ctx with VarCtx::eStore. I could then modify the IR generation of VariableAST to return the AllocaInst* if its ctx is VarCtx::eStore.

Possible Solution: #2

Cast the result of IR generation on the operand (Value*) up to LoadInst.

llvm::LoadInst* target = static_cast<llvm::LoadInst*>(ast->operand->accept(*this));
llvm::Value* targetAlloca = target->getPointerOperand();  

This works fine and should be OK with a cast from Value* to LoadInst* as unary operations should only be done on something that needs to be loaded with CreateLoad anyways (correct me if I'm wrong).

Possible Solution: #3

Leave the dynamic_cast in IR generation stage and completely rely on my semantic analyzer to let the right values through. I'm not entirely thrilled with that solution as what if I want to be able to define a unary operation for something other than a variable? It seems like a hacky solution that I will have to fix later.

Maybe I'm going about the IR generation completely wrong? Or maybe it's an XY problem and there's something wrong with my class architecture? I appreciate any insight!


Solution

  • From a CreateLoad instruction, can I get the AllocaInst* it was loaded from so I can store the result of arithmetic instructions in that AllocaInst* without needing to retrieve it from a namedValues table?

    IRBuilder::CreateLoad() always returns a LoadInst * which has a getPointerOperand() method that will return the same Value * that you created the load with, whether it's an alloca or not. If you're loading something simple like a cast of an alloca, you could use V->stripPointerCasts() (note that there is a family of ~8 strip... functions, pick the right one for your purpose). If the load was created as loading something other than an alloca, then no, the load doesn't know how to find which underlying alloca it's really loading, in general that requires solving pointer analysis (aka. alias analysis) which is a very hard problem.