I am new to ANTLR4 and my grammar contains a rule STRING: '\'' (~'\'' | '.' | '\\\'')* '\'' ;
, but when I try to parse something containing '.'
I obtain the error mismatched input ''.'' expecting {'(', ID, STRING}
. Shouldn't the expected alternative STRING be able to parse my '.'
?
This problem only happens if the character is a dot. I tried modifying my STRING rules or making a rule that only parse '.'
but parsing a dot between quotes doesn't seems possible from my point of view.
EDIT: as requested here is my grammar
grammar Antlr4Grammar;
grammarFile: 'grammar' grammarName=ID ';' (grammarRules+=rules)* baseRules* EOF;
rules: name=ID ':' ((body=ruleBody) | (children+=ID)) ('|' children+=ID)* ';';
ruleBody: (bodies+=terminalRuleBody)+ 'EOF'?;
terminalRuleBody:
body=terminalRuleBody (op=operator)
| parentRuleBody
| stringRuleBody
| affectRuleBody
;
starOperator: '*';
plusOperator: '+';
questionMarkOperator: '?';
operator: starOperator | plusOperator | questionMarkOperator;
parentRuleBody:
'(' body=ruleBody ')';
stringRuleBody:
body=STRING ;
affectRuleBody:
name=ID op=affectOp value=rOperand;
rOperand:
val=ID | val='INT' | val='FLOAT' | val='CHAR' | val='STRING' | val='ID' ;
affectOp: eqOp | plusEqOp;
eqOp: '=';
plusEqOp: '+=';
baseRules:
intBaseRule | floatBaseRule | charBaseRule | stringBaseRule | idBaseRule | wsBaseRule;
intBaseRule: 'INT' ':' '\'-\'?[0-9]+' ';';
floatBaseRule: 'FLOAT' ':' '\'-\'?[0-9]+' '\'.\'' + '[0-9]*' ';' ;
charBaseRule: 'CHAR' ':' '\'\\\'\'' '(\'\\\\\'|.)' '\'\\\'\'' ';' ;
stringBaseRule: 'STRING' ':' '\'"\'' '(\'\\\\\'|.)*?' '\'"\'' ';';
idBaseRule: 'ID' ':' '[a-zA-Z_]' '[a-zA-Z_0-9]*' ';' ;
wsBaseRule: 'WS' ':' '[ \\t\\r\\n]' '->' 'skip' ';' ;
ID: [a-zA-Z_] [a-zA-Z_0-9]*;
STRING: '\'' (~'\'' | '.' | '\\\'')* '\'' ;
WS: [ \t\r\n]+ -> skip;
and here is the program I try to parse
grammar MiniJava;
program: mainClass=mainClass ( classDecl+=classDeclaration )*;
mainClass: 'class' name=ID '{' 'public' 'static' 'void' 'main' '(' 'String' '[' ']' argName=ID ')' '{' body=statement '}' '}';
classDeclaration: 'class' name=ID ('extends' parentClass=ID)? '{' (varDecl+=varDeclaration)* (methodDecl+=methodDeclaration)* '}';
varDeclaration: varType=type varName=ID ';';
methodDeclaration: 'public' returnType=type methodName=ID '(' (argType+=type argName+=ID (',' argType+=type argName+=ID)* )? ')' '{' (varDecl+=varDeclaration)* (body+=Statement)* 'return' returnExpr=expression ';' '}';
type: intArrayType | booleanType | intType | idType;
intArrayType: 'int' '[' ']';
booleanType: 'boolean';
intType: 'int';
idType: typeName=ID;
statement: compoundStatement | ifStatement | whileStatement | printStatement | affectStatement | affectArrayStatement;
compoundStatement: '{' (body+=statement)* '}';
ifStatement: 'if' '(' cond=expression ')' thenPart=statement 'else' elsePart=statement;
whileStatement: 'while' '(' cond=expression ')' body=statement;
printStatement: 'System.out.println' '(' printExpr=expression ')' ';';
affectStatement: lValue=ID '=' rValue=expression ';';
affectArrayStatement: array=ID '[' index=expression ']' '=' value=expression ';' ;
expression: binOpExpr;
binOpExpr: loperand=expression op=operator roperand=postfixExpression | postfixExpression;
operator: andOperator | lessThanOperator | plusOperator | minusOperator | multOperator;
andOperator: '&&';
lessThanOperator: '<';
plusOperator: '+';
minusOperator: '-';
multOperator: '*';
postfixExpression: baseExpr=unaryExpression operator=postfixOperator ;
postfixOperator: arrayIndexOperation | lengthOperation | methodCallOperation;
arrayIndexOperation: '[' index=expression ']';
lengthOperation: '.' 'length';
methodCallOperation: '.' methodName=ID '(' (args+=expression (',' args+=expression)*)? ')';
unaryExpression: constIntExpression | trueExpression | falseExpression | varExpression | thisExpression | newExpression | notExpression | parentExpression;
constIntExpression: value=INT;
trueExpression: 'true';
falseExpression: 'false';
varExpression: varName=ID;
thisExpression: 'this';
newExpression: newClassExpression | newArrayExpression;
newClassExpression: 'new' name=ID '(' ')';
newArrayExpression: 'new' 'int' '[' size=expression ']';
notExpression: '!' baseExpr=expression;
parentExpression: '(' baseExpr=expression ')';
INT: '-'?[0-9]+;
FLOAT: '-'?[0-9]+ '.' [0-9]*;
CHAR: '\'' ('\\'|.) '\'';
STRING: '"' ('\\'|.)*? '"';
ID: [a-zA-Z_] [a-zA-Z_0-9]*;
WS: [ \t\r\n] -> skip ;
I didn't give it at first because I was worried it was a little confusing as my grammar represents a subset of the antlr4 grammar.
You must be careful when mixing literal tokens in a parser rule (the 'literal'
in my example) and lexer rules that might also match what you defined as literal tokens (the ID
rule below):
parse
: 'literal' ID
;
ID
: [a-zA-Z]+
;
If you try to let the input "literal literal"
be parsed by the parse
rule, it will not work, even though the input "literal"
could be matched by the ID
rule.
This is because ANTLR will translate my example as follows:
parse
: T__0 ID
;
T__0
: 'literal'
;
ID
: [a-zA-Z]+
;
And given that ANTLR will always create a single token for a particular input, it will always tokenise "literal"
as a T__0
token; it will never become an ID
token.
This is what is happening in your grammar as well. In your floatBaseRule
, you have defined the literal token '\'.\''
that your stringRuleBody
is trying to match as a STRING
(but cannot do so):
floatBaseRule. : 'FLOAT' ':' '\'-\'?[0-9]+' '\'.\''+ '[0-9]*' ';' ;
stringRuleBody : STRING ;
You either need to change '\'.\''
into stringRuleBody
:
floatBaseRule. : 'FLOAT' ':' '\'-\'?[0-9]+' stringRuleBody+ '[0-9]*' ';' ;
stringRuleBody : STRING ;
or let stringRuleBody
also match the literal '\'.\''
:
floatBaseRule. : 'FLOAT' ':' '\'-\'?[0-9]+' '\'.\''+ '[0-9]*' ';' ;
stringRuleBody : STRING | '\'.\'';
IMO, the best is to remove all these '...'
literal tokens from all your parser rules and move them into their own lexer rules.
Also note that there is an ANTLR4 grammar that can parse itself: https://github.com/antlr/grammars-v4/tree/master/antlr/antlr4