Search code examples
javaantlr4operator-precedence

ANTLR: minus expression precedence and different results with Grun


I have a grammar like this:

/* entry point */
parse: expr EOF;


expr
    : value                                 # argumentArithmeticExpr
    | l=expr operator=(MULT|DIV) r=expr     # multdivArithmeticExpr
    | l=expr operator=(PLUS|MINUS) r=expr   # addsubtArithmeticExpr
    | operator=('-'|'+') r=expr             # minusPlusArithmeticExpr
    | IDENTIFIER '(' (expr ( COMMA  expr )* ) ? ')'# functionExpr
    | LPAREN expr RPAREN                    # parensArithmeticExpr
    ;

value
    : number
    | variable
    | string // contains date
    | bool
    | null_value
    ;


/* Atomes */

bool
    : BOOL
    ;

variable
    : VARIABLE
    ;

string
    : STRING_LITERAL
    ;


number
    : ('+'|'-')? NUMERIC_LITERAL
    ;

null_value
    : NULL // TODO: test this
    ;

IDENTIFIER
    : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
    ;

NUMERIC_LITERAL
    : DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )? // ex: 0.05e3
    | '.' DIGIT+ ( E [-+]? DIGIT+ )? // ex: .05e3
    ;

INT: DIGIT+;

STRING_LITERAL
    :  '\'' ( ~'\'' | '\'\'' )* '\''
    |  '"' ( ~'"' | '""' )* '"'
    ;

VARIABLE
    : LBRACKET ( ~']' | ' ')* RBRACKET
    ;

Now, I want to parse this:

-1.3 * 5 + -2 * 7

With Grun, I get this:

 antlr4 formula.g4 && javac *.java && time  grun formula parse -gui
 -1.3*5 + -2*7
 ^D

enter image description here

Which looks OK and I would be happy with that.

But in my Java code, I get called like this using the Visitor pattern:

visitMinusPlusArithmeticExpr -1.3*5+-2*7 // ugh ?? sees "- (1.3 * 5 + - 2 * 7 )" instead of "(-1.3*5) + (-2*7)"
visitAddsubtArithmeticExpr 1.3*5+-2*7
visitMultdivArithmeticExpr 1.3*5
visitArgumentArithmeticExpr 1.3
visitNumber 1.3
visitArgumentArithmeticExpr 5
visitValue 5
visitNumber 5
visitMinusPlusArithmeticExpr -2*7 // UHG? should see a MultDiv with -2 and 7
visitMultdivArithmeticExpr 2*7
visitArgumentArithmeticExpr 2
visitValue 2
visitNumber 2
visitArgumentArithmeticExpr 7
visitValue 7
visitNumber 7

Which means that I don't get my negative number (-1.3), but rather the 'minus expression', which I should not get.

Why is my Java result different from Grun ? I have verified that the grammar is recompiled and I use my parser like this:

    formulaLexer   lexer = new formulaLexer(new ANTLRInputStream(s));
    formulaParser parser = new formulaParser(new CommonTokenStream(lexer));

    parser.getInterpreter().setPredictionMode(PredictionMode.SLL);
    parser.setErrorHandler(new BailErrorStrategy()); // will throw exceptions on failure

    formula = tryParse(parser);
    if( formula == null && errors.isEmpty() ){
        // the parsing failed, retry in LL mode
        parser.getInterpreter().setPredictionMode(PredictionMode.LL);
        parser.reset();
        tryParse(parser);
    }

I have disabled the SLL mode to verify if this was not the problem, and the result was the same. I thought this could be a problem of precedence, but in my expr I have specified to match a value first, and then only a minusPlusArithmeticExpr.

I can't understand how I will detect this 'minus' expression instead of my 'negative value'. Can you check this?

Also, why does Grun show the correct behavior and not my Java code?

EDIT

Following the comments advice, I modified the grammar to look like this:

expr
    : value                                 # argumentArithmeticExpr
    | (PLUS|MINUS) expr                     # plusMinusExpr
    | l=expr operator=(MULT|DIV) r=expr     # multdivArithmeticExpr
    | l=expr operator=(PLUS|MINUS) r=expr   # addsubtArithmeticExpr
    | function=IDENTIFIER '(' (expr ( COMMA  expr )* ) ? ')'# functionExpr
    | '(' expr ')'           # parensArithmeticExpr
    ;

But now, I want to optimize the case where I have a single "-1.3" somewhere. I don't know how to do it correctly, since when I land in the visitMinusPlusAritmeticExpr, I have to check if the terminal node is a number.

Here is what I get while debugging:

ctx = {formulaParser$PlusMinusExprContext@929} "[16]"
children = {ArrayList@955}  size = 2
    0 = {TerminalNodeImpl@962} "-"
    1 = {formulaParser$ArgumentArithmeticExprContext@963} "[21 16]" 
        children = {ArrayList@967}  size = 1
        0 = {formulaParser$ValueContext@990} "[22 21 16]"
            children = {ArrayList@992}  size = 1
            0 = {formulaParser$NumberContext@997} "[53 22 21 16]"
                children = {ArrayList@999}  size = 1
                0 = {TerminalNodeImpl@1004} "1.3"

I suspect I should walk down the tree and tell if the terminal node is a number, but it seems cumbersome. Do you have any idea on how to do that without compromising legibility of my code?


Solution

  • Ok, for those interested, Lucas and Bart got the answer, and my implementation is like this:

    expr
        : value                                 # argumentArithmeticExpr
        | (PLUS|MINUS) expr                     # plusMinusExpr
        | l=expr operator=(MULT|DIV) r=expr     # multdivArithmeticExpr
        | l=expr operator=(PLUS|MINUS) r=expr   # addsubtArithmeticExpr
        | function=IDENTIFIER '(' (expr ( COMMA  expr )* ) ? ')'# functionExpr
        | '(' expr ')'           # parensArithmeticExpr
        ;
    

    And in the visitor of plusMinusExpr:

    @Override
    public Formula visitPlusMinusExpr(formulaParser.PlusMinusExprContext ctx) {
        if( debug ) LOG.log(Level.INFO, "visitPlusMinusExpr " + ctx.getText());
        Formula formulaExpr = visit(ctx.expr());
        if( ctx.MINUS() == null ) return formulaExpr;
        else {
            if(formulaExpr instanceof DoubleFormula){
                // optimization for numeric values: we don't return "(0.0 MINUS THEVALUE)" but directly "-THEVALUE"
                Double v = - ((DoubleFormula) formulaExpr).getValue();
                return new DoubleFormula( v );
            } else {
                return ArithmeticOperator.MINUS( 0, formulaExpr);
            }
        }
    }