Search code examples
t-sqlantlrparse-tree

Getting just the wanted information from an ANTLR4 generated parse tree in LISP


I've parsed the following SQL query using Antlr4 with tsql_grammar :

"SELECT DepartmentID, Name, GroupName " + "FROM HumanResources.Department";

Note: the query is from MSSQL AdventureWorks2014 DB.

and I've got this LISP parse tree output:

(tsql_file (batch (sql_clauses (sql_clause (dml_clause (select_statement (query_expression (query_specification SELECT (select_list (select_list_elem (expression (full_column_name (id (simple_id DepartmentID))))) , (select_list_elem (expression (full_column_name (id (simple_id Name))))) , (select_list_elem (expression (full_column_name (id (simple_id GroupName)))))) FROM (table_sources (table_source (table_source_item_joined (table_source_item (table_name_with_hint (table_name (id (simple_id HumanResources)) . (id (simple_id Department))))))))))))))) <EOF>)

How can I get access to the children and to the infromation that they have?


Solution

  • The usual approach in such cases is to create a visitor and do a walk over the returned parse tree. ANLTR4 generates a base visitor for you by default (can be switched off via a command line parameter). This visitor contains enter and exit functions for each parser rule in your grammar. So, you only need to override the functions you are interested in. Here's a C++ example for getting the name of a column:

    class SemanticListener : public YourParserBaseListener {
    public:
    
      virtual void exitFull_column_name(YourParser::Full_column_nameContext *ctx) override
      {
        if (ctx->id() != nullptr)
        {
          std::string columnName = ctx->id()->getText();
          //... do something with the column name
        }
      }
    //...
    };
    

    The call to run the walk is very simple:

    SemanticLister semanticListener;
    tree::ParseTreeWalker::DEFAULT.walk(&semanticListener, _tree.get());
    

    _tree is the parse tree returned by the parse run.