I am trying to create ANTLR grammar for a simple programming language.
It has C-like if statements:
program
: statement* EOF
;
statement
: block # blockStatement
| SEMI # emptyStatement
| assignment # assignmentStatement
| declaration # variableDeclarationStatement
| 'if' parExpression ifBody=statement ('else' elseBody=statement)? # ifStatement
..........
;
block
: '{' statement* '}'
;
expression
: literal # literalExpression
| Identifier # variableReference
..........
;
parExpression : '(' expression ')';
assignment : Identifier assignmentOp expression SEMI;
SEMI : ';';
Identifier : (LETTER | '_') (LETTER | DIGIT | '_')* ;
It seems to work fine but when I run with DiagnosticErrorListener
I get errors
reportAttemptingFullContext d=1 (statement), input='else', Line 3:0
reportContextSensitivity d=1 (statement), input='else', Line 3:0
reportAttemptingFullContext d=1 (statement), input='else', Line 5:0
reportContextSensitivity d=1 (statement), input='else', Line 5:0
for code like this
if (flag1)
x = 42;
else if (flag2)
x = 43;
else
x = 44;
I am not sure that I understand what is wrong here, but as I understand in other cases (such as if (a) if (b) ... else ...
) this grammar can be ambiguous.
How should I fix it?
This is called the dangling else problem. Parsing the text:
if (flag1)
if (flag2) x=2;
else x=3;
can match your grammar two ways:
if (flag1)
if (flag2) x=2;
else x=3; // belongs to if (flag2)
and
if (flag1)
if (flag2) x=2;
else x=3; // belongs to if (flag1)
because you made the else clause an optional match. This means that the grammar rules provide an ambiguous match, which is the complaint you are getting from ANTLR.
What you want is force the else to match the nearest unclosed if statement; this is the interpretation of if ... else in most programming languages.
You have to modify the statement parsing rule:
statement
: non_if_statement
| if_statement
;
if_statement
: 'if' parExpression
ifBody= ( non_if_statement 'else' elseBody=statement
| if_statement )
;
non_if_statement
:block
| SEMI
| assignment
| declaration
..........
;
This is a bit awkward to write but should work.
Many parser generators allow you to "force a shift" on encountering a token. If you force a shift on the else keyword in your original grammar, you'll get the same effect. I don't know how to say that for ANTLR, if indeed you can.
[Lischke says if you ignore the error, you might get the right result anyway with your original grammar. I think he is right; that's because the parser generator is forced to choose one of the two interpretations as what it accepts.]