Search code examples
antlr4left-recursion

Antlr4 left recursive rule contains a left recursive alternative which can be followed by the empty string


So I defined a grammar to parse an C style syntax language:

grammar mygrammar;

program
: (declaration)*
  (statement)*
  EOF
;

declaration
: INT ID '=' expression ';'
;

assignment
: ID '=' expression ';'
;

expression
: expression (op=('*'|'/') expression)*
| expression (op=('+'|'-') expression)*
| relation
| INT
| ID
| '(' expression ')'
;

relation
: expression (op=('<'|'>') expression)*
;

statement
: expression ';'
| ifstatement
| loopstatement
| printstatement
| assignment
;

ifstatement
: IF '(' expression ')' (statement)* FI ';'
;

loopstatement
: LOOP '(' expression ')' (statement)* POOL ';'
;

printstatement
: PRINT '(' expression ')' ';'
;

IF : 'if';
FI : 'fi';
LOOP : 'loop';
POOL : 'pool';
INT : 'int';
PRINT : 'print';
ID : [a-zA-Z][a-zA-Z0-9]*;
INTEGER : [0-9]+;
WS : [ \r\n\t] -> skip;

And I can parse a simple test as this:

int i = (2+3)*3/2*(3+36);
int j = i;
int k = 2*1+i*3;
if (k > 2)
  k = k + 1;
  i = i / 3;
  j = j / 3;
fi;
loop (i < 10)
  i = i + 1 * (i+k);
  j = (j + 1) * (j-k);
  k = i + j;
  print(k);
pool;

However, when I want to generate ANTLR Recogonizers in intelliJ, I got this error:

sCalc.g4:19:0: left recursive rule expression contains a left recursive alternative which can be followed by the empty string

I wonder if this is caused by my ID could be an empty string?


Solution

  • There are a couple of issues with your grammar:

    • you have INT as an alternative inside expression while you probably want INTEGER instead
    • there is no need to do expression (op=('+'|'-') expression)*: this will do: expression op=('+'|'-') expression
    • ANTLR4 does not support indirect left recursive rules: you must include relation inside expression

    Something like this ought to do it:

    grammar mygrammar;
    
    program
    : (declaration)*
      (statement)*
      EOF
    ;
    
    declaration
    : INT ID '=' expression ';'
    ;
    
    assignment
    : ID '=' expression ';'
    ;
    
    expression
    : expression op=('*'|'/') expression
    | expression op=('+'|'-') expression
    | expression op=('<'|'>') expression
    | INTEGER
    | ID
    | '(' expression ')'
    ;
    
    statement
    : expression ';'
    | ifstatement
    | loopstatement
    | printstatement
    | assignment
    ;
    
    ifstatement
    : IF '(' expression ')' (statement)* FI ';'
    ;
    
    loopstatement
    : LOOP '(' expression ')' (statement)* POOL ';'
    ;
    
    printstatement
    : PRINT '(' expression ')' ';'
    ;
    
    IF : 'if';
    FI : 'fi';
    LOOP : 'loop';
    POOL : 'pool';
    INT : 'int';
    PRINT : 'print';
    ID : [a-zA-Z][a-zA-Z0-9]*;
    INTEGER : [0-9]+;
    WS : [ \r\n\t] -> skip;
    

    Also not that this (statement)* can simply be written as statement*