I can't get even simpler semantic predicates to work with Antlr 4.6.6 for .net framework 4.8 the grammar below can't find viable alternative for input
"received:last week"
.
grammar test;
// Parser rules
parse
: expr (expr)* EOF
;
expr
: {false}? received ':' lastweek
| received ':' text
| text
;
received: RECEIVED;
lastWeek: LASTWEEK;
text: TEXT;
RECEIVED: 'received';
TEXT
:
~(' ' | ':')+
;
LASTWEEK: 'last week';
SPACES: [ \t\r\n] -> skip;
UPDATE: This is simplification of my problem. Is it possible to have a grammar that can parse this "received:last week" as "received" "last week" only if the "last week" is preceded by "received" but if for example I have "subject:last week" to be parsed as "subject" "last" "week".
When I run this code:
public static void main(String[] args) {
String source = "received:last week";
testLexer lexer = new testLexer(CharStreams.fromString(source));
testParser parser = new testParser(new CommonTokenStream(lexer));
System.out.println(parser.parse().toStringTree(parser));
}
the error line 1:0 no viable alternative at input 'received'
is printed to STDERR. When I change {false}?
to {true}?
, the input is parsed correctly (as expected).
If you had expected the input to be parsed as received ':' text
because of the {false}?
predicate, you're misunderstanding how ANTLR's lexer works. The lexer produces tokens independently from the parser. It doesn't matter that the parser is trying to match a TEXT
token, your input is always tokenised in the same way.
The lexer works like this:
Given these rules, it is clear that "received:last week"
is tokenised as RECEIVED
, ':'
and a LASTWEEK
token.
Is it possible to have a grammar that can parse this "received:last week" as "received" "last week" only if the "last week" is preceded by "received" but if for example I have "subject:last week" to be parsed as "subject" "last" "week"
You could make the lexer somewhat context sensitive by using lexical modes. You must then create separate lexer- and parser grammars, which might look like this:
lexer grammar TestLexer;
RECEIVED : 'received' -> pushMode(RECEIVED_MODE);
SUBJECT : 'subject';
TEXT : ~[ :]+;
COLON : ':';
SPACES : SPACE+ -> skip;
fragment SPACE : [ \t\r\n];
mode RECEIVED_MODE;
LASTWEEK : 'last' SPACE+ 'week' -> popMode;
RECEIVED_MODE_COLON : ':' -> type(COLON);
RECEIVED_MODE_TEXT : ~[ :]+ -> type(TEXT), popMode;
you can use the lexer above like this in your parser grammar:
parser grammar TestParser;
options {
tokenVocab=TestLexer;
}
...
Now "received:last week"
will be tokenised as:
'received' `received`
COLON `:`
LASTWEEK `last week`
EOF `<EOF>`
and "subject:last week"
will be tokenised as:
'subject' `subject`
COLON `:`
TEXT `last`
TEXT `week`
EOF `<EOF>`
You could also move the creation of last week
into the parser like this:
received
: RECEIVED ':' last_week
;
subject
: SUBJECT ':' text
;
last_week
: LAST WEEK
;
text
: TEXT
| LAST
| WEEK
;
RECEIVED : 'received';
SUBJECT : 'subject';
LAST : 'last';
WEEK : 'week';
TEXT : ~[ :]+;