Search code examples
pythonparsingantlrantlr4lexer

Antlr 4 in python not working as expected (trying to parse chapter and paragraph of a book)


I want to create a very simple ANTLR4 parser (in Python) without listener and visitor, which takes as input the chapter and paragraph of a book in any order, and returns the high_level (chapter) and low_level (paragraph) of the entry, e.g. if I enter 2 a or a 2 it should print "chapter 2, paragraph a".

Here is my Example.g4

grammar Example;

text
    : paragraph ;

paragraph
    : high_level (WS low_level)?
    | low_level WS high_level
    ;

low_level
    : 'a' | 'b'  | 'c'  ;

high_level
    : '1'  | '2'  | '3';
WS : [ \t\r\n]+ ;

I do this in my terminal java -jar ~/antlr-4.8-complete.jar -Dlanguage=Python3 -no-listener -no-visitor Example.g4

which generates two python files, and then I wrote the following python script

from antlr4 import *
from ExampleLexer import ExampleLexer
from ExampleParser import ExampleParser


def main():
    while True:
        text = InputStream(input(">"))
        lexer = ExampleLexer(text)
        stream = CommonTokenStream(lexer)
        parser = ExampleParser(stream)
        tree = parser.text()
        query = tree.paragraph()
        low_level = query.low_level()
        high_level = query.high_level()
        print(f"chapter {high_level}, paragraph {low_level}")


if __name__ == '__main__':
    main()

However, if I then run it and input 2 a, I get this chapter [10 8], paragraph [12 8]

Can anyone explain what I'm doing wrong please? I don't understand the numbers in square brackets.


Solution

  • It is just some debugging information displayed by the RuleContext (from which your generated Low_levelContext and High_levelContext classes extend). In your case, the rule's invokingState and parentCtx are displayed.

    Have a look at the source:

    class RuleContext(RuleNode):
    
        ...
    
        def __str__(self):
            return self.toString(None, None)
    
        ...
    
        def toString(self, ruleNames:list, stop:RuleContext)->str:
            with StringIO() as buf:
                p = self
                buf.write("[")
                while p is not None and p is not stop:
                    if ruleNames is None:
                        if not p.isEmpty():
                            buf.write(str(p.invokingState))
                    else:
                        ri = p.getRuleIndex()
                        ruleName = ruleNames[ri] if ri >= 0 and ri < len(ruleNames) else str(ri)
                        buf.write(ruleName)
    
                    if p.parentCtx is not None and (ruleNames is not None or not p.parentCtx.isEmpty()):
                        buf.write(" ")
    
                    p = p.parentCtx
    
                buf.write("]")
                return buf.getvalue()
    
        ...
    

    https://github.com/antlr/antlr4/blob/master/runtime/Python3/src/antlr4/RuleContext.py

    You didn't explain what you wanted to display, but I guess the text the rules matched, in which case you can do this instead:

    print(f"chapter {high_level.getText()}, paragraph {low_level.getText()}")