Search code examples
context-free-grammarcompiler-theoryintermediate-language

Converting a CFG to IL


I build a CFG out of an arbitrary IL and want to convert that CFG back to IL. The order of the vertices in the CFG is of course not equal to the order of the original IL instructions.

This is fine but overcomplicates some stuff. Imagine:

Jump 'B'
'C': Return
'B': Jump 'C'

This would result in a flow graph like this: (Jump B) -> (Jump C) -> (Return) This is of course a simplified example but it shows the problem when converting out of the CFG.

Is there any information available on this topic in academia? I thought traversing the graph bottom-up would be very elegant but that does not work in more complicated cases.

A solution might be to walk top-down and search for a CF merge but in that case I would not be able to handle loops correct. So the only way to get this right seems to be to search for a possible CF merge if it occurs. If not, we have to have a loop, which means the loop is preferred and the continuing path is evaluated afterwards. This sounds like a solveable problem but it is also very expensive and there might exist a more elegant solution to the problem. Besides a loop could also result in a CF merge when thinking about a "break" statement.


Solution

  • Converting CFG into IL: you want to walk over the graph, emitting each vertex exactly once (except those that are unreachable). So you need to record which vertices have been emitted: a flag on the vertex would do, or a hash from vertex to True/False.

    Some vertices will have more than one successor, and you can only follow one of them directly; so you want a way to keep track of vertices that you want to come back to later. A queue is suitable for this.

    This is more-or-less what I've used.

    def needs_label(cfg, v, last):
        if cfg.predecessors(v) > 1:
            # There's more than one way of entering this vertex
            return True
        elif cfg.predecessors(v) == 1 and last != cfg.predecessors(v)[0]:
            # There's only one way, but the last vertex emitted was not that way
            # so it will be entered using a jump.
            return True
        else:
            return False
    
    def emit_label(v):
        print 'label%d' % (v.id)
    
    def emit_vertex(v):
        if v.type == 'branch':
            # Branch to second successor
            print 'br label%d' % cfg.successors(v)[1].id
        else:
            ...
    
    def emit_jump(v):
        print 'jmp label%d' % v.id
    
    def emit_cfg(cfg):
        q = Queue()   # Queue for saving vertices that we want to emit later
        done = {}    # Hash recording which vertices have already been emitted
        q.push(cfg.start())
        while not q.empty():
            v = q.pop()
            last = None
            while v is not None and not done[v]:
                # Emit the vertex, with a prefixed label if necessary
                if needs_label(cfg, v, last):
                    emit_label(v)
                emit_vertex(v)
                done[v] = True
                last = v
                # Get the vertex's successors
                succs = cfg.successors(v)
                # If there aren't any, then this path is finished, so go back to
                # the outer loop to pop another saved vertex
                if len(succs) == 0:
                    v = None   # Setting this will terminate the inner loop
                    continue
                # Stick all the vertices on the stack for later, in case we can't
                # process them all here
                for s in succs:
                    q.push(s)
                # Pick a new vertex from the list of successors.  Always pick the first
                # because if it's a branch then the second will have been branched on
                v = succs[0]
                # If it was emitted earlier we need to jump to it
                if done[v]:
                    emit_jump(v)
                    v = None
                # Otherwise continue the inner loop, walking from the new vertex
    

    Treatment of branches (vertices with more than one successor) is pretty naive: normally you want to figure out which is more likely and follow that one directly, if possible.