Search code examples
compiler-constructionterminologycompiler-theoryabstract-syntax-treeparse-tree

What's the difference between parse trees and abstract syntax trees (ASTs)?


Are they generated by different phases of a compiling process? Or are they just different names for the same thing?


Solution

  • This is based on the Expression Evaluator grammar by Terrence Parr.

    The grammar for this example:

    grammar Expr002;
    
    options 
    {
        output=AST;
        ASTLabelType=CommonTree; // type of $stat.tree ref etc...
    }
    
    prog    :   ( stat )+ ;
    
    stat    :   expr NEWLINE        -> expr
            |   ID '=' expr NEWLINE -> ^('=' ID expr)
            |   NEWLINE             ->
            ;
    
    expr    :   multExpr (( '+'^ | '-'^ ) multExpr)*
            ; 
    
    multExpr
            :   atom ('*'^ atom)*
            ; 
    
    atom    :   INT 
            |   ID
            |   '('! expr ')'!
            ;
    
    ID      : ('a'..'z' | 'A'..'Z' )+ ;
    INT     : '0'..'9'+ ;
    NEWLINE : '\r'? '\n' ;
    WS      : ( ' ' | '\t' )+ { skip(); } ;
    

    Input

    x=1
    y=2
    3*(x+y)
    

    Parse Tree

    The parse tree is a concrete representation of the input. The parse tree retains all of the information of the input. The empty boxes represent whitespace, i.e. end of line.

    Parse Tree

    AST

    The AST is an abstract representation of the input. Notice that parens are not present in the AST because the associations are derivable from the tree structure.

    AST

    For a more through explanation see Compilers and Compiler Generators pg. 23
    or Abstract Syntax Trees on pg. 21 in Syntax and Semantics of Programming Languages