Search code examples
compiler-constructionbranch-prediction

Branch prediction in compiler level


I have been reading about branch prediction but the only implementation I find are mostly in the hardware side of computer. processors seem to take care of most of the prediction. My question is that, can a compiler do a branch prediction? The only thing I found is 2 methods, function inlining and loop unrolling. Are these considered correct? Are they still used?


Solution

  • Sure. A compiler can get predictive information if it knows:

    • Statistical branch probabilities collected by instrumentation runs
    • Statistical distribution of varible values collected by instrumentation runs; it can then predict the average outcome of a conditional and thus the branch
    • Programmer-assertions as to frequency or bias of a conditional
    • Estimates of loop bounds based on ranges (or defaults to "10" if unknown :)
    • Knowledge that a branch is backwards to the top of loop (predict "taken"

    Using such information, it can predict the probable outcome of conditionals, and then generate branch instructions that tend to "predict" correctly by the hardware.

    A particularly interesting set of optimizations done by some compilers is trace scheduling, which determines the sets of paths through code based on probabilities of sequentially encountered branches. By determining the highest probability path, the compiler can do optimizations across that entire path rather that just in within a basic block.

    Sometimes compilers will generate branching code that indirectly uses the hardware's branch prediction capability. Compiled OO languages (static or JITted) have to compile method calls, and jump indirects are expensive. A cheap trick is to keep a small dynamic cache of most-recently invoked methods at each call site, and check the object type being dispatched. If the same type of object is frequently used for dispatch at a call site, the compare/branch sequence for the first (and somewhat less for the second) entry in the cache is highly probable, and the executed code thus avoids a mispredict. This is much better than a jump indirect.

    One last standard trick: if you can avoid doing a branch, you don't have to predict it correctly! Many code sequences look something like this:

      if (exp1 relop exp2)
          X = Y
      endif
    

    Modern CPUs have "predicated" instructions which are in effect "MOV_if_relop A to B", for all relational conditions equal, not equal, less, etc. So rather than generate a branch for the above construct, the compiler generates:

      <compute exp1 and exp2>
      CMP  exp1,exp2 ; sets condition code
      MOVif_relop  X,Y