Search code examples
c#compiler-constructionantlr4

antlr4 in dot net "mismatched input 'begin' expecting {';', '+', '-', '*', DIV, MOD}


I'm using antlr4 in C#.

everything works fine except when i use 'block' everything goes crazy.

for example this is my input code :

a:int;
a:=2;
if(a==2) begin
a:= a * 2;
a:=a + 5;
end

and this is my grammer :

grammar Our;

options{
    language=CSharp;
    TokenLabelType=CommonToken;
    ASTLabelType=CommonTree;
}

statements  :   statement statements
        |EOF;
statement   :
            expression SEMI
        |   ifstmt
        |   whilestmt 
        |   forstmt
        |   readstmt SEMI
        |   writestmt SEMI
        |   vardef SEMI
        |   block
        ;

block       :   BEGIN statements END ;

expression  :   ID ASSIGN expression
        |   boolexp;

boolexp     :   relexp AND boolexp
        |   relexp OR boolexp
        |   relexp;

relexp      :   modexp EQUAL relexp
        |   modexp LE relexp 
        |   modexp GE relexp
        |   modexp NOTEQUAL relexp 
        |   modexp GT relexp 
        |   modexp LT relexp
        |   modexp;

modexp      :   modexp  MOD exp 
        //| exp DIV modexp 
        |   exp;

exp         :   exp  ADD term 
        |   exp  SUB  term 
        |   term;

term        :   term MUL factor 
        |   term DIV factor
        |   factor POW term 
        |   factor;

factor      :   LPAREN expression RPAREN
        |   LPAREN vartype RPAREN  factor
        |   ID
        |   SUB factor
        |   ID LPAREN explist RPAREN 
        |   ID LPAREN RPAREN
        |   ID LPAREN LPAREN NUM RPAREN RPAREN 
        |   ID LPAREN LPAREN NUM COMMA NUM RPAREN RPAREN
        |   const;

explist     :   exp  COMMA  explist
        |exp;

const       :   NUM 
        |   BooleanLiteral          
        |   STRING;

ifstmt      :   IF LPAREN boolexp RPAREN statement
        |   IF LPAREN boolexp  RPAREN statement ELSE statement ;

whilestmt   :   WHILE LPAREN boolexp  RPAREN statement ;

forstmt     :   FOR ID ASSIGN exp  COLON exp statement;

readstmt    :   READ LPAREN  idlist  RPAREN ;

idlist      :   ID COMMA idlist
        |ID;

writestmt   :   WRITE  LPAREN explist RPAREN ;

vardef      :   idlist COLON vartype;


vartype     :   basictypes 
        |   basictypes LPAREN NUM RPAREN 
        |   basictypes LPAREN NUM COMMA NUM RPAREN ;

basictypes  :   INT 
        |   FLOAT 
        |   CHAR 
        |   STRING 
        |   BOOLEAN  ; 


BEGIN         : 'begin';
END           : 'end';
To            : 'to';
NEXT          : 'next';
REAL          : 'real';
BOOLEAN       : 'boolean';
CHAR          : 'char';
DO            : 'do';
DOUBLE        : 'double';
ELSE          : 'else';
FLOAT         : 'float';
FOR           : 'for';
FOREACH       : 'foreach';
FUNCTION      : 'function';
IF            : 'if';
INT           : 'int';
READ          : 'read';
RETURN        : 'return';
VOID          : 'void';
WHILE         : 'while';
WEND          : 'wend';
WRITE         : 'write';

LPAREN          : '(';
RPAREN          : ')';
LBRACE          : '{';
RBRACE          : '}';
LBRACK          : '[';
RBRACK          : ']';
SEMI            : ';';
COMMA           : ',';

ASSIGN          : ':=';
GT              : '>';
LT              : '<';
COLON           : ':';
EQUAL           : '==';
LE              : '<=';
GE              : '>=';
NOTEQUAL        : '!=';
AND             : '&&'|'and';
OR              : '||'|'or';
INC             : '++';
DEC             : '--';
ADD             : '+';
SUB             : '-';
MUL             : '*';
DIV             : '/'|'div';
MOD             : '%'|'mod';
ADD_ASSIGN      : '+=';
SUB_ASSIGN      : '-=';
MUL_ASSIGN      : '*=';
DIV_ASSIGN      : '/=';
POW             : '^';

BooleanLiteral : 'true'|'false';

STRING : '\"'([a-zA-Z]|NUM)*'\"';

ID : ([a-z]|[A-Z])([a-z]|[A-z]|[0-9])*;

NUM : ('+'|'-')?[0-9]([0-9]*)('.'[0-9][0-9]*)?;

WS  :  [ \t\r\n\u000C]+ -> skip ;

COMMENT : '/*' .*? '*/' ;

LINE_COMMENT : '//' ~[\r\n]*;

when i run the parser i get the following error message :

no viable alternative at input 'if(a==2)begina:=a*2;a:=a+5;end' mismatched input 'begin' expecting {';', '+', '-', '*', DIV, MOD} no viable alternative at input 'end'

thanks in advance.


Solution

  • The problem is your rule for a list of statements:

    statements : statement statements | EOF ;
    

    This rule has two options: a statement followed by another list of statements, or EOF. The only non-recursive option is the EOF, which becomes a problem when you use this in your rule for a block:

    block : BEGIN statements END ;
    

    You can never encounter EOF in the middle of a block, so when the parser reads the line before the end in your sample input, the next thing that it expects to read is another statement. The word end on its own isn't a valid statement, which is why it throws the error that you are seeing.

    One possible fix is to make the recursive part of your statements rule optional:

    statements : statement statements? | EOF ;
    

    This will allow your sample input to parse successfully. In my opinion, a better option is to remove the recursion altogether:

    statements : statement* | EOF ;
    

    Finally, you can see that the EOF is still one of the options for the statements rule. This doesn't make much sense when you use this rule in the as part of the rule for block, since you shouldn't ever find an EOF in the middle of a block. What I would do would be to move this to a new top level parser rule:

    program : statements EOF ;
    statements : statement* ;