Search code examples
llvmllvm-ir

What exactly PHI instruction does and how to use it in LLVM


LLVM has phi instruction with quite weird explanation:

The 'phi' instruction is used to implement the φ node in the SSA graph representing the function.

Typically it is used to implement branching. If I understood correctly, it is needed to make dependency analysis possible and in some cases it could help to avoid unnecessary loading. However it's still hard to understand what it does exactly.

Kaleidoscope example explains it fairly nicely for if case. However it's not that clear how to implement logical operations like && and ||. If I type the following to online llvm compiler:

void main1(bool r, bool y) {
    bool l = y || r;
}

Last several lines completely confuse me:

; <label>:10                                      ; preds = %7, %0
%11 = phi i1 [ true, %0 ], [ %9, %7 ]
%12 = zext i1 %11 to i8

Looks like phi node produces a result which can be used. And I was under impression that phi node just defines from which paths values coming.

Could someone explain what is a Phi node, and how to implement || with it?


Solution

  • A phi node is an instruction used to select a value depending on the predecessor of the current block (Look here to see the full hierarchy - it's also used as a value, which is one of the classes which it inherits from).

    Phi nodes are necessary due to the structure of the SSA (static single assignment) style of the LLVM code - for example, the following C++ function

    void m(bool r, bool y){
        bool l = y || r ;
    }
    

    gets translated into the following IR: (created through clang -c -emit-llvm file.c -o out.bc - and then viewed through llvm-dis)

    define void @_Z1mbb(i1 zeroext %r, i1 zeroext %y) nounwind {
    entry:
      %r.addr = alloca i8, align 1
      %y.addr = alloca i8, align 1
      %l = alloca i8, align 1
      %frombool = zext i1 %r to i8
      store i8 %frombool, i8* %r.addr, align 1
      %frombool1 = zext i1 %y to i8
      store i8 %frombool1, i8* %y.addr, align 1
      %0 = load i8* %y.addr, align 1
      %tobool = trunc i8 %0 to i1
      br i1 %tobool, label %lor.end, label %lor.rhs
    
    lor.rhs:                                          ; preds = %entry
      %1 = load i8* %r.addr, align 1
      %tobool2 = trunc i8 %1 to i1
      br label %lor.end
    
    lor.end:                                          ; preds = %lor.rhs, %entry
      %2 = phi i1 [ true, %entry ], [ %tobool2, %lor.rhs ]
      %frombool3 = zext i1 %2 to i8
      store i8 %frombool3, i8* %l, align 1
      ret void
    }
    

    So what happens here? Unlike the C++ code, where the variable bool l could be either 0 or 1, in the LLVM IR it has to be defined once. So we check if %tobool is true, and then jump to lor.end or lor.rhs.

    In lor.end we finally have the value of the || operator. If we arrived from the entry block - then it's just true. Otherwise, it is equal to the value of %tobool2 - and that's exactly what we get from the following IR line:

    %2 = phi i1 [ true, %entry ], [ %tobool2, %lor.rhs ]