Search code examples
parsingcompiler-constructioncontext-free-grammar

Converting a parse tree to AST


Let me put the question first: Can I convert a parse tree implementing this particular grammar to an AST trivially.

I was given this grammar to build a parse tree:

literal := INTEGER | FLOAT | TRUE | FALSE .

designator := IDENTIFIER { "[" expression0 "]" } .

op0 := ">=" | "<=" | "!=" | "==" | ">" | "<" .
op1 := "+" | "-" | "or" .
op2 := "*" | "/" | "and" .

expression0 := expression1 [ op0 expression1 ] .
expression1 := expression2 { op1  expression2 } .
expression2 := expression3 { op2 expression3 } .
expression3 := "not" expression3
       | "(" expression0 ")"
       | designator
       | call-expression
       | literal .

For this particular example:

func main() : void {
    let a = 1 + 2 + 3 + 4;
}

My parser will generate (part of) the parse tree as such

            EXPRESSION1
                EXPRESSION2
                  EXPRESSION3
                    LITERAL
                      INTEGER(1)(lineNum:2, charPos:10)
                OP1
                  ADD(lineNum:2, charPos:12)
                EXPRESSION2
                  EXPRESSION3
                    LITERAL
                      INTEGER(2)(lineNum:2, charPos:14)
                OP1
                  ADD(lineNum:2, charPos:16)
                EXPRESSION2
                  EXPRESSION3
                    LITERAL
                      INTEGER(3)(lineNum:2, charPos:18)
                OP1
                  ADD(lineNum:2, charPos:20)
                EXPRESSION2
                  EXPRESSION3
                    LITERAL
                      INTEGER(4)(lineNum:2, charPos:22)

Just notice how these tree branches under EXPRESSION1 go:

EXPRESSION2 + EXPRESSION2 + EXPRESSION2 + EXPRESSION2

which the operator + doesn't correspond to its two operands. So it seems to me that, in the AST conversion, I can't get an AST that aids 3-address IR code generation by simply pulling up the operator to replace the non-terminal EXPRESSION1.

To achieve this goal, the grammar I would have written for this language will be like this instead

expression1 := expression2 | expression1 + expression2  (1)
expression2 := expression3 | expression2 * expression3  (2)
expression3 := literal                                  (3)

which the branches under EXPRESSION1 are only

EXPRESSION1 + EXPRESSION2

However, this grammar is not LL(1) since |FIRST(expression2)| = |{literal, +}| > 1.

It begs the question that (1) what would be the most elegant and trivial way to convert this parse tree? (2) is my construction of the parse tree a complete waste of time for this grammar that I should have started out coding an AST instead?


Solution

  • I believe you wish to produce an AST like:

         ADD
         /  \
        1   ADD
            /  \  
           2   ADD
               / \
              3   4
    

    But probably you noticed that your parse tree is actually a flat list not representing an easy starting point to group operators and their operands in single pass. Anyway coding a more advanced parser, recognizing tree structure and applying grammar reductions is not a trivial task.

    Hence before starting doing that you might wish to consider some existing parser generators like yacc or ANTLR. Probably you will need to rewrite your grammar to create rules centered on the operators instead of treating them as a recursion utility. A grammar not being a classic LL(1) might not be a big problem as modern generators (like ANTLR) can handle LL(k) grammars with bigger lookahead, predicates etc. and just bypass issues of that type.

    If you still insist on coding it manually think about using a stack to store symbols and converting them to an AST node once an expression is collected but again it is not a simple task.