I have a grammar like this:
/* entry point */
parse: expr EOF;
expr
: value # argumentArithmeticExpr
| l=expr operator=(MULT|DIV) r=expr # multdivArithmeticExpr
| l=expr operator=(PLUS|MINUS) r=expr # addsubtArithmeticExpr
| operator=('-'|'+') r=expr # minusPlusArithmeticExpr
| IDENTIFIER '(' (expr ( COMMA expr )* ) ? ')'# functionExpr
| LPAREN expr RPAREN # parensArithmeticExpr
;
value
: number
| variable
| string // contains date
| bool
| null_value
;
/* Atomes */
bool
: BOOL
;
variable
: VARIABLE
;
string
: STRING_LITERAL
;
number
: ('+'|'-')? NUMERIC_LITERAL
;
null_value
: NULL // TODO: test this
;
IDENTIFIER
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )? // ex: 0.05e3
| '.' DIGIT+ ( E [-+]? DIGIT+ )? // ex: .05e3
;
INT: DIGIT+;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
| '"' ( ~'"' | '""' )* '"'
;
VARIABLE
: LBRACKET ( ~']' | ' ')* RBRACKET
;
Now, I want to parse this:
-1.3 * 5 + -2 * 7
With Grun, I get this:
antlr4 formula.g4 && javac *.java && time grun formula parse -gui
-1.3*5 + -2*7
^D
Which looks OK and I would be happy with that.
But in my Java code, I get called like this using the Visitor pattern:
visitMinusPlusArithmeticExpr -1.3*5+-2*7 // ugh ?? sees "- (1.3 * 5 + - 2 * 7 )" instead of "(-1.3*5) + (-2*7)"
visitAddsubtArithmeticExpr 1.3*5+-2*7
visitMultdivArithmeticExpr 1.3*5
visitArgumentArithmeticExpr 1.3
visitNumber 1.3
visitArgumentArithmeticExpr 5
visitValue 5
visitNumber 5
visitMinusPlusArithmeticExpr -2*7 // UHG? should see a MultDiv with -2 and 7
visitMultdivArithmeticExpr 2*7
visitArgumentArithmeticExpr 2
visitValue 2
visitNumber 2
visitArgumentArithmeticExpr 7
visitValue 7
visitNumber 7
Which means that I don't get my negative number (-1.3), but rather the 'minus expression', which I should not get.
Why is my Java result different from Grun ? I have verified that the grammar is recompiled and I use my parser like this:
formulaLexer lexer = new formulaLexer(new ANTLRInputStream(s));
formulaParser parser = new formulaParser(new CommonTokenStream(lexer));
parser.getInterpreter().setPredictionMode(PredictionMode.SLL);
parser.setErrorHandler(new BailErrorStrategy()); // will throw exceptions on failure
formula = tryParse(parser);
if( formula == null && errors.isEmpty() ){
// the parsing failed, retry in LL mode
parser.getInterpreter().setPredictionMode(PredictionMode.LL);
parser.reset();
tryParse(parser);
}
I have disabled the SLL mode to verify if this was not the problem, and the result was the same.
I thought this could be a problem of precedence, but in my expr
I have specified to match a value
first, and then only a minusPlusArithmeticExpr
.
I can't understand how I will detect this 'minus' expression instead of my 'negative value'. Can you check this?
Also, why does Grun show the correct behavior and not my Java code?
Following the comments advice, I modified the grammar to look like this:
expr
: value # argumentArithmeticExpr
| (PLUS|MINUS) expr # plusMinusExpr
| l=expr operator=(MULT|DIV) r=expr # multdivArithmeticExpr
| l=expr operator=(PLUS|MINUS) r=expr # addsubtArithmeticExpr
| function=IDENTIFIER '(' (expr ( COMMA expr )* ) ? ')'# functionExpr
| '(' expr ')' # parensArithmeticExpr
;
But now, I want to optimize the case where I have a single "-1.3" somewhere.
I don't know how to do it correctly, since when I land in the visitMinusPlusAritmeticExpr
, I have to check if the terminal node is a number.
Here is what I get while debugging:
ctx = {formulaParser$PlusMinusExprContext@929} "[16]"
children = {ArrayList@955} size = 2
0 = {TerminalNodeImpl@962} "-"
1 = {formulaParser$ArgumentArithmeticExprContext@963} "[21 16]"
children = {ArrayList@967} size = 1
0 = {formulaParser$ValueContext@990} "[22 21 16]"
children = {ArrayList@992} size = 1
0 = {formulaParser$NumberContext@997} "[53 22 21 16]"
children = {ArrayList@999} size = 1
0 = {TerminalNodeImpl@1004} "1.3"
I suspect I should walk down the tree and tell if the terminal node is a number, but it seems cumbersome. Do you have any idea on how to do that without compromising legibility of my code?
Ok, for those interested, Lucas and Bart got the answer, and my implementation is like this:
expr
: value # argumentArithmeticExpr
| (PLUS|MINUS) expr # plusMinusExpr
| l=expr operator=(MULT|DIV) r=expr # multdivArithmeticExpr
| l=expr operator=(PLUS|MINUS) r=expr # addsubtArithmeticExpr
| function=IDENTIFIER '(' (expr ( COMMA expr )* ) ? ')'# functionExpr
| '(' expr ')' # parensArithmeticExpr
;
And in the visitor of plusMinusExpr:
@Override
public Formula visitPlusMinusExpr(formulaParser.PlusMinusExprContext ctx) {
if( debug ) LOG.log(Level.INFO, "visitPlusMinusExpr " + ctx.getText());
Formula formulaExpr = visit(ctx.expr());
if( ctx.MINUS() == null ) return formulaExpr;
else {
if(formulaExpr instanceof DoubleFormula){
// optimization for numeric values: we don't return "(0.0 MINUS THEVALUE)" but directly "-THEVALUE"
Double v = - ((DoubleFormula) formulaExpr).getValue();
return new DoubleFormula( v );
} else {
return ArithmeticOperator.MINUS( 0, formulaExpr);
}
}
}