Search code examples
pythonjavaantlrantlr4parse-tree

ParseCancellationException when using ANTLR4 `parser.file_input()` for Python files


I am writing Java code using ANTLR4 to parse Python files. The lexer and parser I use are Python3Lexer.g4 and Python3Parser.g4 from antlr/grammars-v4 Github. The java parsing code works fine most of the time, but sometimes I get the following error.

line 431:1 no viable alternative at input '<EOF>'
Parser Exception: org.antlr.v4.runtime.misc.ParseCancellationException
org.antlr.v4.runtime.misc.ParseCancellationException
        at org.antlr.v4.runtime.BailErrorStrategy.recover(BailErrorStrategy.java:51)
        at Python3Parser.simple_stmt(Python3Parser.java:1667)
        at Python3Parser.stmt(Python3Parser.java:1567)
        at Python3Parser.file_input(Python3Parser.java:348)
        at ConvertPython.serializeFile(ConvertPython.java:89)

Here is part of the ConvertPython.java:

      Python3Lexer lexer = new Python3Lexer(CharStreams.fromFileName(f));
      CommonTokenStream tokens = new CommonTokenStream(lexer);
      vocab = lexer.getVocabulary();

      Python3Parser parser = new Python3Parser(tokens);
      ParserRuleContext t = parser.file_input(); // the exception line

Here is one failing Python:

...
SYBYL2SYMB = {
    "Mo": "Mo",
    "Sn": "Sn",
}

When I tested it, I found this dict cannot be the last line of the Python file. If there is a new line after it, there is no exception.

Besides, I found there would be line 231:7 no viable alternative at input 'resultmatrix_' Parser Exception: org.antlr.v4.runtime.misc.ParseCancellationException for Python code print resultmatrix_. I think it's because this code is Python2 but the ANTLR grammar I'm using is for Python3.

PS, I'm new to ANTLR. Please tell me what I should post for your understanding. Thank you a lot!


Solution

  • The grammar expects a NEWLINE at the end of the "simple statement".

    This works:

    String input = "SYBYL2SYMB = {\n" +
        "    \"Mo\": \"Mo\",\n" +
        "    \"Sn\": \"Sn\",\n" +
        "}\n";
    
    Python3Lexer lexer = new Python3Lexer(CharStreams.fromString(input));
    Python3Parser parser = new Python3Parser(new CommonTokenStream(lexer));
    
    parser.file_input();