Search code examples
parsingantlrabstract-syntax-treevisitor-pattern

Context dependent ANTLR4 ParseTreeVisitor implementation


I am working on a project where we migrate massive number (more than 12000) views to Hadoop/Impala from Oracle. I have written a small Java utility to extract view DDL from Oracle and would like to use ANTLR4 to traverse the AST and generate an Impala-compatible view DDL statement.

The most of the work is relatively simple, only involves re-writing some Oracle specific syntax quirks to Impala style. However, I am facing an issue, where I am not sure I have the best answer yet: we have a number of special cases, where values from a date field are extracted in multiple nested function calls. For example, the following extracts the day from a Date field:

TO_NUMBER(TO_CHAR(d.R_DATE , 'DD' ))

I have an ANTLR4 grammar declared for Oracle SQL and hence get the visitor callback when it reaches TO_NUMBER and TO_CHAR as well, but I would like to have special handling for this special case.

Is not there any other way than implementing the handler method for the outer function and then resorting to manual traversal of the nested structure to see

I have something like in the generated Visitor class:

    @Override
    public String visitNumber_function(PlSqlParser.Number_functionContext ctx) {

        // FIXME: seems to be dodgy code, can it be improved? 
        String functionName = ctx.name.getText();
        if (functionName.equalsIgnoreCase("TO_NUMBER")) {

            final int childCount = ctx.getChildCount();
            if (childCount == 4) {

                final int functionNameIndex = 0;
                final int openRoundBracketIndex = 1;
                final int encapsulatedValueIndex = 2;
                final int closeRoundBracketIndex = 3;

                ParseTree encapsulated = ctx.getChild(encapsulatedValueIndex);
                if (encapsulated instanceof TerminalNode) {
                    throw new IllegalStateException("TerminalNode is found at: " + encapsulatedValueIndex);
                }

                String customDateConversionOrNullOnOtherType =
                        customDateConversionFromToNumberAndNestedToChar(encapsulated);

                if (customDateConversionOrNullOnOtherType != null) {
                    // the child node contained our expected child element, so return the converted value
                    return customDateConversionOrNullOnOtherType;
                }
                // otherwise the child was something unexpected, signalled by null
                // so simply fall-back to the default handler
            }
        }

        // some other numeric function, default handling
        return super.visitNumber_function(ctx);
    }

    private String customDateConversionFromToNumberAndNestedToChar(ParseTree parseTree) {
        // ...
    }


Solution

  • For anyone hitting the same issue, the way to go seems to be:

    1. changing the grammar definition and introducing custom sub-types for the encapsulated expression of the nested function.

    2. Then, I it is possible to hook into the processing at precisely the desired location of the Parse tree.

    3. Using a second custom ParseTreeVisitor that captures the values of function call and delegates back the processing of the rest of the sub-tree to the main, "outer" ParseTreeVisitor.

    Once the second custom ParseTreeVisitor has finished visiting all the sub-ParseTrees I had the context information I required and all the sub-tree visited properly.