Hi I'm currently trying to extract all tokens from ANTLR in C#, i'm using Antlr4.CodeGenerator and Antlr4.Runtime packages.
I want them structured in a way i can manipulate them, change their content and so on. I've tried using listeners and visitors and didn't got nowhere, so my intention is to structure the tokens in a list with objects containing their content, rule and token origin. My parser is validating the input correctly.
I've tried using MyLanguageLexer.GetAllTokens() but it returns empty. Also tried using CommonTokenStream.GetTokens() after executing .Fill(), it only returns the last token found which is an EOF, and I can't figure it out why.
I can iterate them in the ParseTree recursively but it's an unsafe approach to the problem and it bothers me the methods before didn't work.
This is my custom parsing class that currently returns the parsed tree, my objective is to return as the structure i'm triyng to build of the tokens.
public static class Parser
{
public static RootContext TryParse(string query)
{
var inputStream = new AntlrInputStream(query);
var lexer = new StatsQueryLexer(inputStream);
lexer.RemoveErrorListeners();
lexer.AddErrorListener(new LexerErrorListener());
var tokenStream = new CommonTokenStream(lexer);
var parser = new StatsQueryParser(tokenStream);
parser.RemoveErrorListeners();
parser.AddErrorListener(new ParserErrorListener());
parser.BuildParseTree = true;
var tree = parser.root();
var tokens = lexer.GetAllTokens();
return tree;
}
}
This is my lexer:
lexer grammar StatsQueryLexer;
SPACE: [ \t\r\n]+ -> skip;
NULL_: 'NULL';
L_BRACKET: '(';
R_BRACKET: ')';
NUMBER: [-]? [0-9]+ ('.' [0-9]+)?;
OPERATOR: ('+' | '-' | '*' | '/');
COMPARATOR: ('=' | '!=' | '>' | '<' | '>=' | '<=');
SUM_FN: 'SOMA';
AVG_FN: 'MEDIA';
MAX_FN: 'MAX';
MIN_FN: 'MIN';
COUNT_FN: 'CONTA';
SQL_FN: 'SQL';
IF: 'SE';
THEN: 'RETORNA';
ELSE: 'SENAO';
QUOTE: '`' ([\u0000-\uFFFF])+ '`';
COMMA: ',';
COLUMN: '{' ([a-z] | [A-Z] | [0-9] | ' ')+ '}';
This is my parser:
parser grammar StatsQueryParser;
options {
tokenVocab = StatsQueryLexer;
}
root: el += expression (OPERATOR el += expression)* EOF;
expression:
NUMBER
| NULL_
| aggregateFunction
| nativeSqlFunction
| caseElse
| expression OPERATOR expression
| L_BRACKET expression R_BRACKET;
aggregateFunction:
aggregateFunctionPrefix L_BRACKET aggregateFunctionArgs R_BRACKET;
aggregateFunctionArgs:
NUMBER
| COLUMN
| nativeSqlFunction
| caseElse
| L_BRACKET aggregateFunctionArgs R_BRACKET;
aggregateFunctionPrefix:
SUM_FN
| AVG_FN
| MAX_FN
| MIN_FN
| COUNT_FN;
nativeSqlFunction: SQL_FN L_BRACKET QUOTE R_BRACKET;
caseElse: IF (comparison THEN expression)+ ( ELSE expression)?;
comparison:
expression COMPARATOR expression
| L_BRACKET comparison R_BRACKET;
This works fine on my machine (with the ANTLR 4.9.3 C# runtime):
const string query = "1 + 2";
var inputStream = new AntlrInputStream(query);
var lexer = new StatsQueryLexer(inputStream);
var tokenStream = new CommonTokenStream(lexer);
tokenStream.Fill();
var parser = new StatsQueryParser(tokenStream)
{
BuildParseTree = true
};
Console.WriteLine($"Parse tree: {parser.root().ToStringTree(parser)}");
Console.WriteLine("\nTokens:");
foreach (var token in tokenStream.GetTokens())
{
Console.WriteLine($" {StatsQueryLexer.DefaultVocabulary.GetSymbolicName(token.Type), -15} '{token.Text}'");
}
which prints:
Parse tree: (root (expression (expression 1) + (expression 2)) <EOF>)
Tokens:
NUMBER '1'
OPERATOR '+'
NUMBER '2'
EOF '<EOF>'