Search code examples
compiler-constructioncompiler-theory

Compilers: Register Allocation Against Complex Branching/Jumps


I've taken a side interest to optimizers and how they work, particularly with respect to register allocation. I have somewhat of a background in writing high-level interpreters that didn't bother to generate efficient machine code, so the parts revolving around compiler construction that relate to parsing, constructing ASTs, etc. are fairly straightforward for me.

As a learning project, I've been trying my hand at a toy compiler which is only slightly higher level than machine-level with the main difference being that it works with variables rather than registers.

Where I'm very confused are the low-level optimizer parts, specifically with respect to register allocation from the IR and how that is affected by branching/jumps, even with the most basic of heuristic algorithms excluding advanced topics like SSA and phi nodes.

Basic example:

a = ...
b = ...
c = ...
d = ...
e = ...
f = ...
g = ...

jump_if x == 1, section1
jump_if x == 2, section2
jump_if x == 3, section3
etc

a = a + b - c * 2
jump end

section1:
; all kinds of stuff happens here with some of the above variables
jump end

section2:
; all kinds of stuff happens here with some of the above variables
jump end

section3:
; all kinds of stuff happens here with some of the above variables
jump section1 ; tricky branch!!!

end:

And perhaps we can throw in loopy logic and all kinds of other branching to make this example even more convoluted.

What I don't understand is that all of these branching paths might potentially make all of the variables above 'live' if we consider them all together instead of each path individually.

What seems lacking to me is some kind of stack-based block structure so that we can have nested blocks where the register allocation can taken into account the variables referenced by the innermost block and its outer blocks and execute that register allocation heuristic separately on each block/branching path.

In a higher-level IR that's more block-based, it seems much easier to deduce branching paths because the branching would be constrained within a block, but what about low-level IR that is just slightly abstracted above the machine level that has completely unconstrained branching where you can just jump/branch all over the place?

Most IR examples I've seen are rather low-level abstractions of machine code, so they often seem to allow us to get really chaotic with how we do our branching (ex: jump tables) in a way that might really make it difficult to deduce such blocks/sections/paths.

How do people typically deal with this problem? Is there an algorithm and a clean organization/code design that can break down all possible branching paths given such low-level code that allows such flexible branches/jumps?


Solution

  • Register allocation is a long standing problem. There have been a number of research papers in this area. The recent popular algorithms in this area are SSA and linear scan register allocation.

    Static single assignment form (SSA) has gained some popularity to help the register allocation. The idea is to convert a program logic into a variable which would be assigned only once and every variable needs to be assigned before it is used. It is generally assumed that there are an unlimited number of registers that can be used.

    Once the program is converted into SSA form, it then can be converted into register allocation easier. This can be done with graph coloring in polynomial time (The popular compiler to do so is LLVM). It is a pretty complex topic to discuss here. I would recommend you to read a couple of papers in this area.

    1. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph by Ron Cytron, et al. He is one of the pioneer in SSA area.

    2. Single-Pass Generation of Static Single Assignment Form for Structured Languages by Marc M. Brandis, et al.

    3. A Simple, Fast Dominance Algorithm by Keith D. Cooper, et al.

    If you don't want to deal with SSA as an intermediate step, you might want to take a look at Linear Scan Register Allocation by Massimiliano Poletto. This greedy algorithm is used in many non-LLVM based compilers including v8 and JVM.