Search code examples
treeantlrabstract-syntax-treetraversalparse-tree

What will be the right algorithm to bottom up parse and reduce a parse tree?


The example parse tree is for the expression

IF( ABS( column ) > 10 ,"8214","8215")

inorder to reduce it to an AST having nodes (user-defined) of only QueryFunction (ABS, IF), QueryColumns (column), QueryConstants (10 ,"8214","8215") like

                    IF
                    |
     |--------------|--------------| 
     >            "8214"         "8215"
|----|----|
ABS       10
|
columnName

What will be the right algorithm to use ? Is a complex parsing approach like LR parsing required in this case ?

Please note that the QueryFunction can have various number of parameters depending on the type of function. Also each function can have other nested functions within it.

Parse Tree Example


Solution

  • There's nothing particularly "complex" about doing this. It's fairly common for ANTLR users to want to transform the parse tree to an AST (especially, given all the intermediate odes a parse tree contains).

    With an ANTLR Listener (or visitor), it's relatively simple to visit the parse tree and use it to construct whatever structure of AST you need.

    Unless you KNOW you need an AST, you may wish to try the native Listeners/Visitors that ANTLR produces. By overriding the *BaseListener or *BaseVisitor classes, you get default implementations for all of your nodes that allow you to write code that may safely "ignore" the intermediate nodes you don't really care about, and just "listen in" on the nodes you do care about (for listeners). (you'd want to understand this if you want to use them to create an AST regardless, so no wasted learning).

    NB: whatever grammar you're using looks like it could use some refinement. The very bottom if the graph is an example of what I'm talking about. The column_name parse rule should either be a Token or refer to a Token rule for pulling the column name into a single token instead of a list of characters. With a more appropriate grammar, your ANTLR parse tree will seem much less ridiculous (and easier to work with for generating an AST if you need).