Why is Antlr going into an infinite loop when processing my grammar

I created an ANTLR grammar for parsing mathematical expressions and a second one for evaluating them. As I thought building an AST and then re-parsing in order to actually evaluate it is sort of one operation too much, I wanted to refactor my grammar to produce a hierarchy of "Term" objects representing the expression including the logic to perform that particular operation. The root Term object can then be simply evaluated to a concrete result.

I had to rewrite quite a lot of my grammar and finally got rid of the last error message. Unfortunately now ANTLR seems to sort of go into an infinite loop.

Could someone here please help me sort out the problem? I think the grammar should be pretty interesting for some, therefore I am posting it. (It is based upon a garmmar I found with google, I should admit, but I have altered it quite a lot to suite my needs).

grammar SecurityRulesNew;

options {
language = Java;
    output=AST;
    backtrack = true;
    ASTLabelType=CommonTree;
    k=2;
}

tokens {
    POS;
    NEG;
    CALL;
}

@header {package de.cware.cweb.services.evaluator.parser;}
@lexer::header{package de.cware.cweb.services.evaluator.parser;}

formula returns [Term term]
: a=expression EOF { $term = a; }
;
expression returns [Term term]
: a=boolExpr { $term = a; }
;
boolExpr returns [Term term]
: a=sumExpr { $term = a; }
| a=sumExpr AND b=boolExpr { $term = new AndTerm(a, b); }
| a=sumExpr OR b=boolExpr { $term = new OrTerm(a, b); }
| a=sumExpr LT b=boolExpr { $term = new LessThanTerm(a, b); }
| a=sumExpr LTEQ b=boolExpr { $term = new LessThanOrEqualTerm(a, b); }
| a=sumExpr GT b=boolExpr { $term = new GreaterThanTerm(a, b); }
| a=sumExpr GTEQ b=boolExpr { $term = new GreaterThanTermOrEqual(a, b); }
| a=sumExpr EQ b=boolExpr { $term = new EqualsTerm(a, b); }
| a=sumExpr NOTEQ b=boolExpr { $term = new NotEqualsTerm(a, b); }
;
sumExpr returns [Term term]
: a=productExpr { $term = a; }
| a=productExpr SUB b=sumExpr { $term = new SubTerm(a, b); }
| a=productExpr ADD b=sumExpr { $term = new AddTerm(a, b); }
;
productExpr returns [Term term]
: a=expExpr { $term = a; }
| a=expExpr DIV productExpr { $term = new DivTerm(a, b); }
| a=expExpr MULT productExpr { $term = new MultTerm(a, b); }
;
expExpr returns [Term term]
: a=unaryOperation { $term = a; }
| a=unaryOperation EXP expExpr { $term = new ExpTerm(a, b); }
;
unaryOperation returns [Term term]
: a=operand { $term = a; }
| NOT a=operand { $term = new NotTerm(a); }
| SUB a=operand { $term = new NegateTerm(a); }
;
operand returns [Term term]
: l=literal { $term = l; }
| f=functionExpr { $term = f; }
| v=VARIABLE { $term = new VariableTerm(v); }
| LPAREN e=expression RPAREN { $term = e; }
;
functionExpr returns [Term term]
: f=FUNCNAME LPAREN! RPAREN! { $term = new CallFunctionTerm(f, null); }
| f=FUNCNAME LPAREN! a=arguments RPAREN! { $term = new CallFunctionTerm(f, a); }
;
arguments returns [List<Term> terms]
: a=expression 
    { 
        $terms = new ArrayList<Term>(); 
        $terms.add(a);
    }
| a=expression COMMA b=arguments
    { 
        $terms = new ArrayList<Term>(); 
        $terms.add(a);
        $terms.addAll(b);
    }
;
literal returns [Term term]
: n=NUMBER { $term = new NumberLiteral(n); }
| s=STRING { $term = new StringLiteral(s); }
| t=TRUE { $term = new TrueLiteral(t); }
| f=FALSE { $term = new FalseLiteral(f); }
;

STRING
:
'\"'
    ( options {greedy=false;}
    : ESCAPE_SEQUENCE
    | ~'\\'
    )*
'\"'
|
'\''
    ( options {greedy=false;}
    : ESCAPE_SEQUENCE
    | ~'\\'
    )*
'\''
;
WHITESPACE
: (' ' | '\n' | '\t' | '\r')+ {skip();};
TRUE
: ('t'|'T')('r'|'R')('u'|'U')('e'|'E')
;
FALSE
: ('f'|'F')('a'|'A')('l'|'L')('s'|'S')('e'|'E')
;

NOTEQ           : '!=';
LTEQ            : '<=';
GTEQ            : '>=';
AND             : '&&';
OR              : '||';
NOT             : '!';
EQ              : '=';
LT              : '<';
GT              : '>';

EXP             : '^';
MULT            : '*';
DIV             : '/';
ADD             : '+';
SUB             : '-';

LPAREN          : '(';
RPAREN          : ')';
COMMA           : ',';
PERCENT         : '%';

VARIABLE
: '[' ~('[' | ']')+ ']'
;
FUNCNAME
: (LETTER)+
;
NUMBER
: (DIGIT)+ ('.' (DIGIT)+)?
;

fragment
LETTER 
: ('a'..'z') | ('A'..'Z')
;
fragment
DIGIT
: ('0'..'9')
;
fragment
ESCAPE_SEQUENCE
: '\\' 't'
| '\\' 'n'
| '\\' '\"'
| '\\' '\''
| '\\' '\\'
;

Help is greatly appreciated.

Chris

Solution

Because your grammar is so incredibly ambiguous, ANTLR has a problem creating a parser. Apparently ANTLR 3.3+ chokes on it, but ANTLR 3.2 (with less time than 3.3+) produces the following error:

error(10): internal error: org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1279): could not even do k=1 for decision 1; reason: timed out (>1000ms)

For a simple expression parser, you really shouldn't use backtrack=true.

Besides the fact your grammar is ambiguous, much of your embedded code contains errors.

Let's have a look at your formula rule:

formula returns [Term term]
: a=expression EOF { $term = $a; }
;

Also, the return type of a rule should be explicitly defined. The a in { $term = a; } should have a $ in front of it:

formula returns [Term term]
: a=expression EOF { $term = $a; }
;

but then $a refers to the entire "thing" expression returns. You then have to "tell" ANTLR you want the Term this expression creates. This can be done like this:

formula returns [Term term]
: a=expression EOF { $term = $a.term; }
;
expression returns [Term term]
: a=boolExpr { $term = $a.term; }
;

It looks like you've converted some LR grammar into an ANTLR grammar (note that although ANTLR ends with LR, ANTLR 3.x is an LL parser generator) and without testing in between, you had hoped it should all work: unfortunately, it doesn't. There's too much wrong with it to produce a small working example based on your grammar: I'd have a look at an existing expression parser based on an ANTLR grammar and try again. Have a look at these Q&A's: