I auto-generated the:
public class SomePythonListener extends Python3ParserBaseListener {
public SomePythonListener
Python3Parser parser, String someValue) {
this.parser = parser;
this.someValue = someValue;
}
@Override
public void visitTerminal(TerminalNode node) {
Token token = node.getSymbol();
System.out.println("token.getType()=" + token.getType());
System.out.println("getText:" + token.getText() + "XXXX\n\n");
}
}
And I feed it the source code:
"""A file docstring.
With a multiline starting docstring.
That spans the first 3 lines."""
# Some Comment.
# Another comment
"""Some string."""
def foo():
"""Some docstring."""
print('hello world')
def bar():
"""Another docstring."""
print('hello world')
def baz():
"""Third docstring."""
print('hello universe')
This then outputs:
token.getType()=3
getText:"""A file docstring.
With a multiline starting docstring.
That spans the first 3 lines."""END
token.getType()=44
getText:
END
token.getType()=3
getText:"""Some string."""END
token.getType()=44
getText:
END
token.getType()=15
getText:defEND
token.getType()=45
getText:fooEND
token.getType()=57
getText:(END
token.getType()=58
getText:)END
token.getType()=60
getText::END
token.getType()=44
getText: END
token.getType()=1
getText: ENDtoken.getType()=3
For completeness, the 44
represents the new line character, and one can see that the first docstring is included, followed by a new line, followed by the second docstring """Some string."""
, however both comments: # Some Comment.
and # Another comment
are ignored/not visited/not shown.
The TerminalNode node
objects of the visitTerminal
do not include the comments.
How can I include the comments in the visitor?
Based on these answers it seems I should get those from the hidden channels. I did not yet figure out how to do that. For completeness, the auto-generated Python3Lexer.java
file contains:
public static String[] channelNames = {"DEFAULT_TOKEN_CHANNEL", "HIDDEN"};
public static String[] modeNames = {"DEFAULT_MODE"};
The TerminalNode node objects of the visitTerminal do not include the comments.
That is correct: these tokens are skipped in the lexer. You can also put these tokens on another channel (so not skip them) by replacing -> skip
with -> channel(HIDDEN)
. But that will still not cause them to appear in the visitTerminal(...)
method. After all: only tokens defined in parser rules appear there.
For the record, when changing:
SKIP_ : ( SPACES | COMMENT | LINE_JOINING) -> skip;
...
fragment COMMENT : '#' ~[\r\n\f]*;
to:
COMMENT : '#' ~[\r\n\f]* -> channel(HIDDEN);
SKIP_ : ( SPACES | LINE_JOINING) -> skip;
in the Python3Lexer.g4
file and then re-generate lexer/parser classes, you can see comments are now not discarded, but placed on another channel:
String source = "\"\"\"A file docstring.\n" +
"With a multiline starting docstring.\n" +
"That spans the first 3 lines.\"\"\"\n" +
"# Some Comment.\n" +
"\n" +
"# Another comment\n" +
"\"\"\"Some string.\"\"\"\n" +
"def foo():\n" +
" \"\"\"Some docstring.\"\"\"\n" +
" print('hello world')\n" +
" def bar():\n" +
" \"\"\"Another docstring.\"\"\"\n" +
" print('hello world')\n" +
"def baz():\n" +
" \"\"\"Third docstring.\"\"\"\n" +
" print('hello universe')\n";
Python3Lexer lexer = new Python3Lexer(CharStreams.fromString(source));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
tokenStream.fill();
for (Token t : tokenStream.getTokens()) {
System.out.printf("channel=%s, text=%s%n",
t.getChannel(), t.getText().replace("\n", "\\n"));
}
will print:
channel=0, text="""A file docstring.\nWith a multiline starting docstring.\nThat spans the first 3 lines."""
channel=1, text=# Some Comment.
channel=1, text=# Another comment
channel=0, text=\n
channel=0, text="""Some string."""
channel=0, text=\n
channel=0, text=def
...
But they will still not be a part of the parse tree you're walking with a listener or visitor: only tokens defined in parser rules will show up there.