Search code examples
performanceoperationbranch-prediction

Why are ternary and logical operators more efficient than if branches?


I stumbled upon this question/answer which mentions that in most languages, logical operators such as:

x == y && doSomething();

can be faster than doing the same thing with an if branch:

if(x == y) {
  doSomething();
}

Similarly, it says that the ternary operator:

x = y == z ? 0 : 1

is usually faster than using an if branch:

if(y == z) {
  x = 0;
} else {
  x = 1;
}

This got me Googling, which led me to this fantastic answer which explains branch prediction.

Basically, what it says is that the CPU operates at very fast speeds, and rather than slowing down to compute every if branch, it tries to guess what outcome will take place and places the appropriate instructions in its pipeline. But if it makes the wrong guess, it will have to back up and recompute the appropriate instructions.

But this still doesn't explain to me why logical operators or the ternary operator are treated differently than if branches. Since the CPU doesn't know the outcome of x == y, shouldn't it still have to guess whether to place the call to doSomething() (and therefore, all of doSomething's code) into its pipeline? And, therefore, back up if its guess was incorrect? Similarly, for the ternary operator, shouldn't the CPU have to guess whether y == z will evaluate to true when determining what to store in x, and back up if its guess was wrong?

I don't understand why if branches are treated any differently by the compiler than any other statement which is conditional. Shouldn't all conditionals be evaluated the same way?


Solution

  • Short answer - it simply isn't. While helping branch prediction could improve you performance - using this as a part a logical statement doesn't change the compiled code. If you want to help branch prediction use __builtin_expect (for GNU)

    To emphasize let's compare the compiler output:

    #include <stdio.h>
    
    
    int main(){
            int foo;
    
            scanf("%d", &foo); /*Needed to eliminate optimizations*/
    
    #ifdef IF       
            if (foo)
                    printf("Foo!");
    #else
            foo &&  printf("Foo!");
    #endif 
            return 0;
    }
    

    For gcc -O3 branch.c -DIF We get:

    0000000000400540 <main>:
      400540:       48 83 ec 18             sub    $0x18,%rsp
      400544:       31 c0                   xor    %eax,%eax
      400546:       bf 68 06 40 00          mov    $0x400668,%edi
      40054b:       48 8d 74 24 0c          lea    0xc(%rsp),%rsi
      400550:       e8 e3 fe ff ff          callq  400438 <__isoc99_scanf@plt>
      400555:       8b 44 24 0c             mov    0xc(%rsp),%eax
      400559:       85 c0                   test   %eax,%eax #This is the relevant part
      40055b:       74 0c                   je     400569 <main+0x29>
      40055d:       bf 6b 06 40 00          mov    $0x40066b,%edi
      400562:       31 c0                   xor    %eax,%eax
      400564:       e8 af fe ff ff          callq  400418 <printf@plt>
      400569:       31 c0                   xor    %eax,%eax
      40056b:       48 83 c4 18             add    $0x18,%rsp
      40056f:       c3                      retq 
    

    And for gcc -O3 branch.c

    0000000000400540 <main>:
      400540:       48 83 ec 18             sub    $0x18,%rsp
      400544:       31 c0                   xor    %eax,%eax
      400546:       bf 68 06 40 00          mov    $0x400668,%edi
      40054b:       48 8d 74 24 0c          lea    0xc(%rsp),%rsi
      400550:       e8 e3 fe ff ff          callq  400438 <__isoc99_scanf@plt>
      400555:       8b 44 24 0c             mov    0xc(%rsp),%eax
      400559:       85 c0                   test   %eax,%eax
      40055b:       74 0c                   je     400569 <main+0x29>
      40055d:       bf 6b 06 40 00          mov    $0x40066b,%edi
      400562:       31 c0                   xor    %eax,%eax
      400564:       e8 af fe ff ff          callq  400418 <printf@plt>
      400569:       31 c0                   xor    %eax,%eax
      40056b:       48 83 c4 18             add    $0x18,%rsp
      40056f:       c3                      retq 
    

    This is exactly the same code.

    The question you linked to measures performance for JAVAScript. Note that there it may be interpreted (since Java script is interpreted or JIT depending on the version) to something different for the two cases. Anyway JavaScript is not the best for learning about performance.