Search code examples
assemblycompiler-constructionllvmcompiler-optimizationllvm-ir

How does LLVM avoid generating redundant native code for the `br` IR instruction?


For the following C code

void foo() {
    int forty_two = 42;
    if (forty_two == 42) {
    }
}

clang -S -emit-llvm foo.c emits this IR code:

define dso_local void @foo() #0 {
  %1 = alloca i32, align 4
  store i32 42, i32* %1, align 4
  %2 = load i32, i32* %1, align 4
  %3 = icmp eq i32 %2, 42
  br i1 %3, label %4, label %5

4:                                                ; preds = %0
  br label %5

5:                                                ; preds = %4, %0
  ret void
}

And for the IR, LLVM (llc foo.ll) generates the x64 code below. It's abridged for readability.

foo:                                    # @foo
# %bb.0:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $42, -4(%rbp)
    cmpl    $42, -4(%rbp)
    jne .LBB0_2
# %bb.1:
    jmp .LBB0_2
.LBB0_2:
    popq    %rbp
    retq

In contrast to the native code emitted by LLVM, translating the IR code in a straightforward way would contain a number of redundant instructions. Something along these lines:

foo:
# %bb.0:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $42, -4(%rbp)
    cmpl    $42, -4(%rbp)

# create the i1 boolean in a register.
# (This instruction is redundant and LLVM doesn't emit is)
    sete    %al
# See whether the comparison's result was `true` or `false`.
# (This instruction is redundant and LLVM doesn't emit it)
    cmpb    $1, %al

    jne .LBB0_2
# %bb.1:
    jmp .LBB0_2
.LBB0_2:
    popq    %rbp
    retq

My question is: Where is the portion of LLVM code that makes sure these redundant instructions are not emitted? And how does it work?

I read an excellent post Life of an instruction in LLVM by @Eli Bendersky and looked at the code in SelectionDAGBuilder::visitICmp and SelectionDAGBuilder::visitBr. But didn't manage to figure out the answer on my own.


Solution

  • TLDR: X86FastISel::X86SelectBranch

    Someone on LLVM's discord told me about the -print-after-all flag of llc. (Actually, @arnt mentioned it in their answer even before I asked on discord and I have no idea why I didn't give the flag a try right away...)

    That flag let me see that "X86 DAG->DAG Instruction Selection" was the first pass that didn't only transform IR but turned it into x86-specific Machine IR (MIR). (The corresponding class is X86DAGToDAGISel).
    And from the MIR it emitted, it was clear the decision to emit or not SETCC/TEST instructions happens during the pass run.

    Stepping through X86DAGToDAGISel::runOnMachineFunction eventually brought me to X86FastISel::X86SelectBranch. There,

    • in case br is the only user of icmp's result and the instructions are in the same basic block, the pass decides not to emit SETCC/TEST
    • in case icmp's result has other users or the two IR instructions aren't in the same basic block, the pass will actually emit SETCC/TEST.

    So, for this C code:

    void foo() {
        int forty_two = 42;
        int is_forty_two;
        if (is_forty_two = (forty_two == 42)) {
        }
    }
    

    clang -S -emit-llvm brcond.c produces the following IR:

    define void @foo() #0 {
    entry:
      %forty_two = alloca i32, align 4
      %is_forty_two = alloca i32, align 4
      store i32 42, i32* %forty_two, align 4
      %0 = load i32, i32* %forty_two, align 4
      %cmp = icmp eq i32 %0, 42
      %conv = zext i1 %cmp to i32
      store i32 %conv, i32* %is_forty_two, align 4
      br i1 %cmp, label %if.then, label %if.end
    
    if.then:                                          ; preds = %entry
      br label %if.end
    
    if.end:                                           ; preds = %if.then, %entry
      ret void
    }
    

    Obviously, %cmp has more than one user. So llc brcond.ll emits the assembly below (abridged a bit):

    foo:                                    # @foo
    # %bb.0:                                # %entry
        pushq   %rax
        movl    $42, (%rsp)
        cmpl    $42, (%rsp)
        sete    %al
        movb    %al, %cl
        andb    $1, %cl
        movzbl  %cl, %ecx
        movl    %ecx, 4(%rsp)
        testb   $1, %al
        jne .LBB1_1
        jmp .LBB1_2
    .LBB1_1:                                # %if.then
        jmp .LBB1_2
    .LBB1_2:                                # %if.end
        popq    %rax
        retq