Search code examples
assemblycompiler-constructioncompiler-optimizationpeephole-optimization

Why peephole optimization is done on assembly code but not on IR code?


I don't understand why peephole optimization is needed? Because the compiler is smart enough to optimise the code? Can you please give me some examples where peephole optimization is needed?


Solution

  • Peepholes are often target-specific.
    They may only make sense in terms of target registers (RTL), not IR.

    For example e.g. x86 xor eax, eax instead of mov eax,0. (What is the best way to set a register to zero in x86 assembly: xor, mov or and?). There'd be no reason to do this in IR and doing it any earlier than the last moment (final code-generation) would obfuscate the fact that the value is zero for other optimizations. Doing that for any machine except x86 would be an anti-optimization (creating a false dependency). OTOH you don't want to leave it too late, or else you might not be able to reorder it ahead of something that sets FLAGS, e.g.

      xor  eax,eax
      cmp  ecx, edx
      sete al           ; boolean 0 or 1  zero-extended to 64-bit RAX
    

    Instead of

      cmp   ecx, edx
      sete  al               ; false dependency on old RAX
      movzx eax, al          ; no mov-elimination, extra critical path latency
    

    or

      cmp   ecx, edx
      mov   eax, 0          ; less efficient instruction to leave FLAGS untouched
      sete  al              ; later reads of RAX will have partial-register stalls on P6-family
    

    Or as another example, x86 can multiply by 3, 5, or 9 using LEA to take advantage of the 2-bit shift and add in 2-register addressing-modes. It might be useful for an optimizer to know that this is an efficient building-block, and aim to re-factor things into a multiply by 9, but actually converting a multiply by 10 into (x * 5) * 2 is not how you'd want to do it for targets where (x<<3) + (x<<1) is more efficient (x*10 = x*8 + x*2).

    See