I'm using antlr4 in C#.
everything works fine except when i use 'block' everything goes crazy.
for example this is my input code :
a:int;
a:=2;
if(a==2) begin
a:= a * 2;
a:=a + 5;
end
and this is my grammer :
grammar Our;
options{
language=CSharp;
TokenLabelType=CommonToken;
ASTLabelType=CommonTree;
}
statements : statement statements
|EOF;
statement :
expression SEMI
| ifstmt
| whilestmt
| forstmt
| readstmt SEMI
| writestmt SEMI
| vardef SEMI
| block
;
block : BEGIN statements END ;
expression : ID ASSIGN expression
| boolexp;
boolexp : relexp AND boolexp
| relexp OR boolexp
| relexp;
relexp : modexp EQUAL relexp
| modexp LE relexp
| modexp GE relexp
| modexp NOTEQUAL relexp
| modexp GT relexp
| modexp LT relexp
| modexp;
modexp : modexp MOD exp
//| exp DIV modexp
| exp;
exp : exp ADD term
| exp SUB term
| term;
term : term MUL factor
| term DIV factor
| factor POW term
| factor;
factor : LPAREN expression RPAREN
| LPAREN vartype RPAREN factor
| ID
| SUB factor
| ID LPAREN explist RPAREN
| ID LPAREN RPAREN
| ID LPAREN LPAREN NUM RPAREN RPAREN
| ID LPAREN LPAREN NUM COMMA NUM RPAREN RPAREN
| const;
explist : exp COMMA explist
|exp;
const : NUM
| BooleanLiteral
| STRING;
ifstmt : IF LPAREN boolexp RPAREN statement
| IF LPAREN boolexp RPAREN statement ELSE statement ;
whilestmt : WHILE LPAREN boolexp RPAREN statement ;
forstmt : FOR ID ASSIGN exp COLON exp statement;
readstmt : READ LPAREN idlist RPAREN ;
idlist : ID COMMA idlist
|ID;
writestmt : WRITE LPAREN explist RPAREN ;
vardef : idlist COLON vartype;
vartype : basictypes
| basictypes LPAREN NUM RPAREN
| basictypes LPAREN NUM COMMA NUM RPAREN ;
basictypes : INT
| FLOAT
| CHAR
| STRING
| BOOLEAN ;
BEGIN : 'begin';
END : 'end';
To : 'to';
NEXT : 'next';
REAL : 'real';
BOOLEAN : 'boolean';
CHAR : 'char';
DO : 'do';
DOUBLE : 'double';
ELSE : 'else';
FLOAT : 'float';
FOR : 'for';
FOREACH : 'foreach';
FUNCTION : 'function';
IF : 'if';
INT : 'int';
READ : 'read';
RETURN : 'return';
VOID : 'void';
WHILE : 'while';
WEND : 'wend';
WRITE : 'write';
LPAREN : '(';
RPAREN : ')';
LBRACE : '{';
RBRACE : '}';
LBRACK : '[';
RBRACK : ']';
SEMI : ';';
COMMA : ',';
ASSIGN : ':=';
GT : '>';
LT : '<';
COLON : ':';
EQUAL : '==';
LE : '<=';
GE : '>=';
NOTEQUAL : '!=';
AND : '&&'|'and';
OR : '||'|'or';
INC : '++';
DEC : '--';
ADD : '+';
SUB : '-';
MUL : '*';
DIV : '/'|'div';
MOD : '%'|'mod';
ADD_ASSIGN : '+=';
SUB_ASSIGN : '-=';
MUL_ASSIGN : '*=';
DIV_ASSIGN : '/=';
POW : '^';
BooleanLiteral : 'true'|'false';
STRING : '\"'([a-zA-Z]|NUM)*'\"';
ID : ([a-z]|[A-Z])([a-z]|[A-z]|[0-9])*;
NUM : ('+'|'-')?[0-9]([0-9]*)('.'[0-9][0-9]*)?;
WS : [ \t\r\n\u000C]+ -> skip ;
COMMENT : '/*' .*? '*/' ;
LINE_COMMENT : '//' ~[\r\n]*;
when i run the parser i get the following error message :
no viable alternative at input 'if(a==2)begina:=a*2;a:=a+5;end' mismatched input 'begin' expecting {';', '+', '-', '*', DIV, MOD} no viable alternative at input 'end'
thanks in advance.
The problem is your rule for a list of statements:
statements : statement statements | EOF ;
This rule has two options: a statement
followed by another list of statements
, or EOF
. The only non-recursive option is the EOF
, which becomes a problem when you use this in your rule for a block
:
block : BEGIN statements END ;
You can never encounter EOF
in the middle of a block
, so when the parser reads the line before the end
in your sample input, the next thing that it expects to read is another statement
. The word end
on its own isn't a valid statement
, which is why it throws the error that you are seeing.
One possible fix is to make the recursive part of your statements
rule optional:
statements : statement statements? | EOF ;
This will allow your sample input to parse successfully. In my opinion, a better option is to remove the recursion altogether:
statements : statement* | EOF ;
Finally, you can see that the EOF
is still one of the options for the statements
rule. This doesn't make much sense when you use this rule in the as part of the rule for block
, since you shouldn't ever find an EOF
in the middle of a block
. What I would do would be to move this to a new top level parser rule:
program : statements EOF ;
statements : statement* ;