pythonpython-3.xantlrantlr4grammar

I'm trying to generate the parse tree for Antlr4 Python3.g4 grammar file, to parse python3 code


I'm using ANTLR4 and trying to generate a parse tree for a python file I have. I used the grammar file python3.g4 from the ANTLR4 documentation. I have the antlr4-python3-runtime installed, and I have ran this command:

antlr4 -Dlanguage=Python3 Python3.g4

This generated my parser and lexer files.

In Python3Lexer.py, I had errors for:

from typing.io import TextIO

so I changed it to:

from typing import TextIO

I also created this file called pythonparser.py, which is in the same folder as the parser and lexer files, to call onto the parser:

import sys
from antlr4 import *
from Python3Lexer import Python3Lexer
from Python3Parser import Python3Parser

def main(argv):
    input_stream = FileStream(argv[1])
    lexer = Python3Lexer(input_stream)
    stream = CommonTokenStream(lexer)
    parser = Python3Parser(stream)
    tree = parser.single_input()

if __name__ == '__main__':
    main(sys.argv)

I have also made a test.py file, which is in the same folder as the antlr grammars, with:

print("hello world")

I tried to run the grammar on this file to parse it, using the command:

python3 pythonparser.py test.py

Im not sure what to do as it doesn't work for me.

I receive this error message:

Traceback (most recent call last):
  File "/Users/Fari/Developer/PRJ/project/antlr/pythonparser.py", line 3, in <module>
    from Python3Lexer import Python3Lexer
  File "/Users/Fari/Developer/PRJ/project/antlr/Python3Lexer.py", line 19, in <module>
    LanguageParser = getattr(importlib.import_module('{}Parser'.format(module_path)), '{}Parser'.format(language_name))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Fari/Developer/PRJ/project/antlr/Python3Parser.py", line 446, in <module>
    class Python3Parser ( Parser ):
  File "/Users/Fari/Developer/PRJ/project/antlr/Python3Parser.py", line 450, in Python3Parser
    atn = ATNDeserializer().deserialize(serializedATN())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/antlr4/atn/ATNDeserializer.py", line 60, in deserialize
    self.reset(data)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/antlr4/atn/ATNDeserializer.py", line 90, in reset
    temp = [ adjust(c) for c in data ]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/antlr4/atn/ATNDeserializer.py", line 90, in <listcomp>
    temp = [ adjust(c) for c in data ]
             ^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/antlr4/atn/ATNDeserializer.py", line 88, in adjust
    v = ord(c)
        ^^^^^^
TypeError: ord() expected string of length 1, but int found

I'm not sure where I'm going wrong.


Solution

  • There are a lot of Python grammars. The ones you need are these:

    After you've downloaded both these grammars, you need to preprocess them by running the file transformGrammar.py in the same folder as where the 2 grammar files are in.

    Now download these 2 classes into the same folder:

    When that is all done, generate the lexer and parser Python classes:

    java -jar antlr-4.11.1-complete.jar *.g4 -Dlanguage=Python3
    

    And if you now run the file:

    from antlr4 import *
    from Python3Lexer import Python3Lexer
    from Python3Parser import Python3Parser
    
    def main():
        input_stream = InputStream('print("hello world")\n')
        lexer = Python3Lexer(input_stream)
        stream = CommonTokenStream(lexer)
        parser = Python3Parser(stream)
        tree = parser.single_input()
        print(tree.toStringTree(recog=parser))
    
    if __name__ == '__main__':
        main()
    
    

    the following output will be printed:

    (single_input (simple_stmts (simple_stmt (expr_stmt (testlist_star_expr (test (or_test (and_test (not_test (comparison (expr (xor_expr (and_expr (shift_expr (arith_expr (term (factor (power (atom_expr (atom (name print)) (trailer ( (arglist (argument (test (or_test (and_test (not_test (comparison (expr (xor_expr (and_expr (shift_expr (arith_expr (term (factor (power (atom_expr (atom "hello world"))))))))))))))))) ))))))))))))))))))) \n))
    

    Note that I did not change anything else (no typing.io to typing was needed). I used:

    • Python 3.10.9
    • ANTLR 4.11.1

    EDIT

    When I stick the following in a file:

    #!/usr/bin/env bash
    wget https://raw.githubusercontent.com/antlr/grammars-v4/master/python/python3/Python3Lexer.g4
    wget https://raw.githubusercontent.com/antlr/grammars-v4/master/python/python3/Python3Parser.g4
    wget https://raw.githubusercontent.com/antlr/grammars-v4/master/python/python3/Python3/transformGrammar.py
    wget https://raw.githubusercontent.com/antlr/grammars-v4/master/python/python3/Python3/Python3LexerBase.py 
    wget https://raw.githubusercontent.com/antlr/grammars-v4/master/python/python3/Python3/Python3ParserBase.py
    wget https://www.antlr.org/download/antlr-4.11.1-complete.jar
    
    python3 transformGrammar.py
    
    pip install antlr4-python3-runtime
    
    java -jar antlr-4.11.1-complete.jar *.g4 -Dlanguage=Python3
    
    cat << EOF > main.py
    from antlr4 import *
    from Python3Lexer import Python3Lexer
    from Python3Parser import Python3Parser
    
    def main():
        input_stream = InputStream('print("hello world")\n')
        lexer = Python3Lexer(input_stream)
        stream = CommonTokenStream(lexer)
        parser = Python3Parser(stream)
        tree = parser.single_input()
        print(tree.toStringTree(recog=parser))
    
    if __name__ == '__main__':
        main()
    EOF
    
    python3 --version
    
    python3 main.py
    

    and run this file, I get the following output:

    ...
    
    antlr-4.11.1-complete.jar              100%[============================================================================>]   3,38M  9,33MB/s    in 0,4s
    
    2023-01-31 10:51:47 (9,33 MB/s) - ‘antlr-4.11.1-complete.jar’ saved [3547867/3547867]
    
    Altering Python3Lexer.g4
    Writing ...
    Altering Python3Parser.g4
    Writing ...
    Requirement already satisfied: antlr4-python3-runtime in /opt/homebrew/lib/python3.10/site-packages (4.11.1)
    Python 3.10.9
    (single_input (simple_stmts (simple_stmt (expr_stmt (testlist_star_expr (test (or_test (and_test (not_test (comparison (expr (xor_expr (and_expr (shift_expr (arith_expr (term (factor (power (atom_expr (atom (name print)) (trailer ( (arglist (argument (test (or_test (and_test (not_test (comparison (expr (xor_expr (and_expr (shift_expr (arith_expr (term (factor (power (atom_expr (atom "hello world"))))))))))))))))) ))))))))))))))))))) \n))