Search code examples
cllvmllvm-codegen

Why does LLVM allocate a redundant variable?


Here's a simple C file with an enum definition and a main function:

enum days {MON, TUE, WED, THU};

int main() {
    enum days d;
    d = WED;
    return 0;
}

It transpiles to the following LLVM IR:

define dso_local i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  store i32 2, i32* %2, align 4
  ret i32 0
}

%2 is evidently the d variable, which gets 2 assigned to it. What does %1 correspond to if zero is returned directly?


Solution

  • The %1 register was generated by clang to handle multiple return statements in a function. Imagine you were writing a function to compute an integer's factorial. Instead of this

    int factorial(int n){
        int result;
        if(n < 2)
          result = 1;
        else{
          result = n * factorial(n-1);
        }
        return result;
    }
    

    You'd probably do this

    int factorial(int n){
        if(n < 2)
          return 1;
        return n * factorial(n-1);
    }
    

    Why? Because Clang will insert that result variable that holds the return value for you. Yay. That's the reason for that %1 variable. Look at the IR for a slightly modified version of your code.

    Modified code,

    enum days {MON, TUE, WED, THU};
    
    int main() {
        enum days d;
        d = WED;
        if(d) return 1;
        return 0;
    }
    

    IR,

    define dso_local i32 @main() #0 !dbg !15 {
        %1 = alloca i32, align 4
        %2 = alloca i32, align 4
        store i32 0, i32* %1, align 4
        store i32 2, i32* %2, align 4, !dbg !22
        %3 = load i32, i32* %2, align 4, !dbg !23
        %4 = icmp ne i32 %3, 0, !dbg !23
        br i1 %4, label %5, label %6, !dbg !25
    
     5:                                                ; preds = %0
       store i32 1, i32* %1, align 4, !dbg !26
       br label %7, !dbg !26
    
     6:                                                ; preds = %0
      store i32 0, i32* %1, align 4, !dbg !27
      br label %7, !dbg !27
    
     7:                                                ; preds = %6, %5
      %8 = load i32, i32* %1, align 4, !dbg !28
      ret i32 %8, !dbg !28
    }
    

    Now you see %1 making itself useful huh? Most functions with a single return statement will have this variable stripped by one of LLVM's passes.