Search code examples
c++abstract-syntax-treegraph-visualization

How to visualize an Abstract Syntax Tree graphically?


I wrote a simple compiler in C++ and want to visualize the Abstract Syntax Tree it produces. Currently, I dump the AST in a super long string similar to something below:

Program(decls=[ConstDecl(type=BasicTypeKind::Int, value=Num(n=1, loc=Location(1, 21)), name=positive, loc=Location(1, 10)), ConstDecl(type=BasicTypeKind::Int, value=Num(n=-1, loc=Location(2, 21)), name=negative, loc=Location(2, 10)), ConstDecl(type=BasicTypeKind::Int, value=Num(n=100, loc=Location(3, 26)), name=max_heap_size, loc=Location(3, 10)), ConstDecl(type=BasicTypeKind::Character, value=Char(c=99, loc=Location(4, 23)), name=...

As you can see, this dump isn't very human friendly in terms of visualization. One cannot naturally relate the notion a tree to such a long string. I tried the approach to pretty-print the AST and found astpretty, which is for Python. It aims a lot in debugging but what if I want an illustration of the AST? A graphical format surely fits better.

Actually I have the picture about what the output I am looking forward. Graphviz does a great job on this field and various graphs the C++ document tool Doxygen generates are very close to my purpose conceptually.

Putting these together, I want a way to turn an AST in memory as C++ objects into decent graphic output (static one is ok). Any good starting point?

Edit: as comments said, dumping my AST in format Graphviz recognizes is a good starting point. I will try do it this way until new and more concrete problems arise. Thanks guys.


Solution

  • I found a solution that is both fast and acceptable. LLVM-8 includes an option to visualize a Control Flow Graph (CFG) by generating a dot format file from it. Basically this works for more general graph structures as long as you specialize the llvm::GraphTraits and llvm::DOTGraphTraits templates (with a little work around) and the llvm::WriteGraph() will work for you. Here is result without too much fine-tuning:

    enter image description here

    given the C-like snippet:

    const int IntConstant = 1;
    int Array[2];
    
    void main() {
      Printf("hello");
    }
    

    This language is a simplification of C and it is case-insensitive. A Printf() is represented by a Write statement as the node in the image. ConstDecl means constant declaration. VarDecl means variable declaration and FuncDef means function definition. Other things are pretty much straightforward.

    Before explaining the magic, I would like to point you to these documents that really help how to use the llvm APT involved.

    • GraphTraits to know the signature your code should have.
    • WriteGraph to know what WriteGraph requires of your code.
    • CFGPrinter to know how to specialize your GraphTraits
    • DOTGraphTraits to know how to make your illustration meaningful by giving labels and descriptions to nodes.
    • iterator_facade_base to know how to painlessly write working iterators.

    Now with this knowledge bases, you just need to put them together. In fact, follow these steps:

    1. think of how to iterate over each node in the AST and specialize the nodes_iterator implementation.
    2. think of how to iterate over all children of a given node and specialize the ChildIteratorType implementation.
    3. think of how to extract useful information from your nodes and specialize the getNodeLabel() and getNodeDescription() of your DOTGraphTraits.
    4. note the mine: the NodeRef type of GraphTraits must be a pointer! knowing this beforehand can save you a day. This is true for the latest version. Maybe in the future they will remove the restriction.

    My current implementation only include the class name of a node as its label and a self-explanatory string as its description without any fine-tuning. If you want more effects, i.e., color, shape, etc., you can do it in your DOTGraphTraits.