Search code examples
compiler-construction

Does a tuple-based internal representation of control flow statements exist?


Internal representations can be done in several ways. For example, I understand that a mathematical expression might be converted to prefix or postfix notation and then stored in a stack, so that when you pop some operands they pop next to the corresponding operator and the program can carry on the calculation. Now, another form of internal representation is in tuples, most often (?) of three or four elements:

For example,
(4+5)/(2+x)
could be represented as the tuple set
(1)(+,4,5)
(2)(+,2,x)
(3)(/,(1),(2))
internally in this fashion. I believe this is what's called two/three-address code, or at least related to it.

This is all fine, but my question is how can control flow statements be represented this way? I know there are other ways to represent those internally, like in abstract syntax trees and similar stuff, but what about this specific way? Can control flow statements be represented internally as a set of n-tuples, or is this notation exclusively useful for math operations? For clarification, I'm not talking about the implementation in any one specific compiler; I'm just asking if this is possible with that 'notation'.

Edit: Corrected wrong math.


Solution

  • This type of intermediate representation (often written with syntax, such as %1 = 4+5 or %1 = add 4, 5) is also known as three address code. Three address code can represent arbitrary control flow, not just expressions. Note that assembly and machine languages also have a flat structure consisting of instructions with a fixed number of operands, so it isn't really much different from three address code. And obviously assembly can contain arbitrary control flow.

    The way to do that is with branching instructions. Most simply you could have two of them: one that takes a target address for unconditional branches and one that takes a target and Boolean argument as a condition. All types of loops and conditional statements can be compiled down to jumps. As an example the loop while (condition) { body } ...restOfTheCode could be compiled to this (where T is supposed to translate a given expression to our IR):

      %cond = T(condition)
    checkCond:
      brif %cond, loop
      br endLoop
    loop:
      T(body)
      br checkCond
    endLoop:
      T(...restOfTheCode)