assembly compilation compiler-construction

Minimize amount of jumps when compiling to assembly

Suppose we have compiled some program to some abstract intermediate language: our program is a sequence of operations and decisions (branches) which next operation to execute (basically, a tree). Example:

a();
if (bla) then c();
else b(); d();
e();

Now we need to "pack" this sequence to our linear memory and by doing so, our code has to branch (conditionally and not). For instance, one of possibilities:

      CALL A
      TEST bla
      BRANCH else
      CALL C
      JMP esc
else: CALL B
      CALL D
esc:  CALL E

There are of course multiple possibilities on how to arrange these blocks in linear memory and they all differ in amount of branches/jumps. Our goal is to minimize them.
Questions:
a) How this problem is called? Is there a general name for it? (Is it an instance of BDD construction?)
b) What is the complexity of such problem (finding an arrangement of blocks such that the amount of jumps/branches is minimal)?

Solution

What you have is a bunch of basic blocks which pass control from one to another at the end of each basic block.

You can model this easily with a directed graph. Nodes are basic blocks. Directed arcs from one node to another represent (wlog) a conditional branch (sometimes the condition is "true") from one block to another. You should consider raised exceptions as branches, and catch handlers as blocks, too.

Linearizing a series of blocks is essentially choosing a sequence of arcs. Such a sequence ends with a block that does no branch (e.g., does function return or application exit).

What you want to do is to choose a set of arc sequences (imagine you color these blue) such that the remaining arcs (imagine these as red) is minimal.

I don't know what the complexity is, but since you arguably have two choices at each node, it seems like it is exponentially hard. (In the worst case, you can have a fully connected graph. Usually reasoning about such graphs is very expensive).

I'd expect that one can implement this with a greedy scheme and get pretty good results: repeatedly find the longest arc sequence, color it blue.

Maximal sequentialization is probably not what you really want. Imagine we attach a probability to each arc, based on profiling data, algorithm knowledge, assumptions that exceptions are rare therefore low probability, or even programmer provided hints. Now what you want are long chains of arcs with largest probabilities; such a path is called a "trace" in the literature. This ensures that hot paths in the code are sequential in memory, maximizing the value of instruction caches. Off-sequence branches defined this way are low-probability, e.g., cold paths.

You still have the same complexity choosing. But the greedy scheme may work even better: form sequences by simply choosing the highest probability arc from each node already in a sequence.

With the arc sequences colored, it is easy to produce the code in the "right" order.

This paper discusses the "code placement" using traces more formally, specifically to reduce the cost of cache misses. It discusses a greedy selection process, which produces pretty good results, as well as a more sophisticated but fairly expensive time-wise "local search" scheme that produces excellent results.