Search code examples
assemblycompiler-constructiongraph-algorithmcpu-registersregister-allocation

Why do compilers construct a graph in register allocation?


I've been researching register allocation and was wondering why they all built graphs from the live registers list when there could be a better way to do it. The way I think they could do it is when the live registers crosses the number of registers available, then registers could be spilled. Here is an example (pseudo-assembly):

## ldi: load immediate
## addr: add registers and store in arg 2
## store: store memory at offset from stack pointer
.text
    main:
        # live registers: {}
        ldi    %t0, 12             # t0 = 12
        # live registers: {t0}
        ldi    %t1, 8              # t1 = 8
        # live registers: {t0, t1}
        addr   %t0, %t1            # t1 = t0 + t1
        # live registers: {t1}
        store  -4(%sp), %t1        # -4(%sp) = t1
        # live registers: {}
        exit

I have laid out the live registers in the assembly code. Now, all the tutorials and texts construct interference graphs from here, etc. But instead of that (as I mentioned above), they could look at the alive registers. For example if this was a one 1 register machine, then when the live registers are {t0, t1}, we will have to choose a register to spill. I feel this is much simpler than constructing a graph and doing all the other stuff to check if we have to spill registers. I know ignorance is not global (somebody must have thought of this and deemed it not fit), so what am I not seeing here?


Solution

  • Building a graph isn't necessary, for example the Linear Scan algorithm avoids building a graph. Apparently it's used by JIT-compilers like V8 and HotSpot because it's fast, the tradeoff being less optimal decision making.

    Linear Scan is more sophisticated than just one-pass and spill when you run out of registers. Instead you find the live ranges and check when they overlap. This can do a not-bad job even with some branching and looping.

    I'd imagine that your simplistic algorithm could degenerate pretty badly in branchy code if you're not smart about having either side of the branch use the same temporary registers, and the kind of analysis that Linear Scan does. As @supercat says, not all code is straight-line. Even then, an LRU decision on what to spill wouldn't be optimal. You're a compiler, you can look ahead to see what registers are doing to be used next.

    Also you need to look ahead to see if/how the result is even used, unless you're planning not to optimize at all. e.g. x++; x++; should compile the same as x+=2 to an add instruction, not into two separate add-1 operations. So you need some kind of data structure to represent program logic, not just turn it into asm on the fly in a single pass. (Unless you're writing a truly one-pass compiler like tcc.)

    Note that many compilers aim for good code, not just correct code, and that means minimizing spill/reload, especially on loop-carried dependency chains. Also handling allocation well even in branchy code. This is where a Static Single Assignment (SSA) graph is useful, along with being smart about when to hoist or sink a computation or memory access out of a loop.

    Related: Register allocation and spilling, the easy way? has some more detail about register allocation algorithms, also Compilers: Register Allocation Against Complex Branching/Jumps has some links to papers.