I'm writing an extended mathematical expression evaluator. It is supposed to allow defining macros in the form:
f(x):=2*x
I'm using Irony to break the expression into the parse tree. Grammar looks like following:
using Irony.Parsing;
using System;
using System.Collections.Generic;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ProCalc.NET.Grammar
{
class MathExpressionGrammar : Irony.Parsing.Grammar
{
public MathExpressionGrammar()
{
// Construct elements
var unaryOperator = new NonTerminal("UnaryOperator");
var unaryOperation = new NonTerminal("UnaryOperation");
var binaryOperator = new NonTerminal("BinaryOperator");
var binaryOperation = new NonTerminal("BinaryOperation");
var variable = new NonTerminal("Variable");
var externalVariable = new NonTerminal("ExternalVariable");
var arguments = new NonTerminal("Arguments");
var functionCall = new NonTerminal("FunctionCall");
var matrixEntries = new NonTerminal("MatrixEntries");
var matrixRows = new NonTerminal("MatrixRows");
var matrix = new NonTerminal("Matrix");
var listEntries = new NonTerminal("ListEntries");
var list = new NonTerminal("List");
var term = new NonTerminal("Term");
var scalar = new NonTerminal("Scalar");
var parenthesis = new NonTerminal("Parenthesis");
var expression = new NonTerminal("Expression");
var root = new NonTerminal("Root");
var variableAssignment = new NonTerminal("VariableAssignment");
var functionAssignment = new NonTerminal("FunctionAssignment");
var parameters = new NonTerminal("Parameters");
var indexer = new NonTerminal("Indexer");
var indices = new NonTerminal("Indices");
var integer = new RegexBasedTerminal("Integer", "[0-9]+");
var real = new RegexBasedTerminal("Real", @"([0-9]+\.[0-9]+)|([0-9]+(\.[0-9]+)?e[\+\-]?[0-9]+)");
var complex = new RegexBasedTerminal("Complex", @"(([0-9]+(\.[0-9]+)?)|([0-9]+(\.[0-9]+)?e[\+\-]?[0-9]+))i");
var sysInteger = new RegexBasedTerminal("SysInteger", "0((b[0-1]+)|(o[0-7]+)|(d[0-9]+)|(h[0-9a-fA-F]+))");
var dms = new RegexBasedTerminal("Dms", @"[0-9]+:[0-5]?[0-9](:[0-5]?[0-9](\.[0-9]+)?)?");
var @string = new RegexBasedTerminal("String", @"""(([^""])|(""""))*""");
var bignum = new RegexBasedTerminal("BigNum", @"[0-9]+(\.[0-9]+)?[lL]");
var comma = ToTerm(",", "Comma");
var semicolon = ToTerm(";", "Semicolon");
var identifier = new RegexBasedTerminal("Identifier", @"[a-zA-Z_@][a-zA-Z0-9\._@]*");
var callparam = new RegexBasedTerminal("CallParam", @"_[0-9]+");
var assign = ToTerm(":=");
// Configure non-terminals
root.Rule = variableAssignment | functionAssignment | expression;
variableAssignment.Rule = identifier + assign + expression;
functionAssignment.Rule = identifier + "(" + parameters + ")" + assign + expression;
parameters.Rule = MakePlusRule(parameters, comma, identifier);
expression.Rule = term | unaryOperation | binaryOperation | indexer;
term.Rule = scalar | @string | matrix | list | functionCall | variable | callparam | externalVariable | parenthesis;
scalar.Rule = integer | real | complex | sysInteger | dms | bignum;
matrix.Rule = ToTerm("[") + matrixRows + "]";
matrixRows.Rule = MakePlusRule(matrixRows, semicolon, matrixEntries);
matrixEntries.Rule = MakePlusRule(matrixEntries, comma, term);
list.Rule = ToTerm("{") + listEntries + "}";
listEntries.Rule = MakePlusRule(listEntries, comma, term);
functionCall.Rule = identifier + "(" + arguments + ")";
arguments.Rule = MakePlusRule(arguments, comma, expression);
variable.Rule = identifier;
externalVariable.Rule = ToTerm("$") + identifier;
parenthesis.Rule = ToTerm("(") + expression + ")";
unaryOperation.Rule = unaryOperator + term + ReduceHere();
unaryOperator.Rule = ToTerm("-") | "!";
binaryOperation.Rule = expression + binaryOperator + expression;
binaryOperator.Rule = ToTerm("+") | "-" | "*" | "/" | @"\" | "%" | "^" | "<<" | ">>" | "<" | ">" | "<=" | ">=" | "==" | "!=" | "&" | "|" | "#";
indexer.Rule = term + "[" + indices + "]";
indices.Rule = MakePlusRule(indices, comma, integer);
// Configure operators
RegisterOperators(50, "&", "|", "#", "<<", ">>");
RegisterOperators(30, "^");
RegisterOperators(20, "/", @"\", "%");
RegisterOperators(15, "*");
RegisterOperators(10, "+", "-");
RegisterOperators(5, "<", ">", "<=", ">=", "==", "!=");
// Clean up unnecessary terms
MarkPunctuation(",", ";", "(", ")", "[", "]", "{", "}", "=");
this.Root = root;
}
}
}
Upon compiling, I'm getting exactly one reduce-reduce conflict:
State S66 (Inadequate)
Reduce-reduce conflicts on inputs: ) Comma
Shift items:
FunctionCall -> Identifier ·( Arguments )
Reduce items:
Parameters -> Identifier · [) Comma]
Variable -> Identifier · [[ + - * / \ % ^ << >> < > <= >= == != & | # ) Comma]
Transitions: (->S75
Irony complains about the function assignment rule. And indeed, attempt to parse (perfectly valid) expression:
f(x)
yields an error (":= expected").
It is weird for me though, because without :=, it is a perfect expression:
Root ->
Expression ->
Term ->
FunctionCall ->
Identifier ( Arguments ) ->
"f" ( Expression ) ->
"f" ( Term ) ->
"f" ( Variable ) ->
"f" ( Identifier ) ->
"f" ( "x" )
I guess that ambiguity comes from the fact, that Parameters
are actually a subset of Arguments
- for instance "x, y, z" may be passed both inside function declaration and function invocation.
I could fix that by allowing arguments
inside function declaration:
functionAssignment.Rule = identifier + "(" + arguments + ")" + assign + expression;
Then, in "runtime", I'd check, if all arguments are variables. This seems like a workaround though, because in reality there is no way you could ambiguously resolve function declaration and call (first requires :=, second forbids existence of :=).
How can I fix this reduce-reduce conflict?
This seems like a workaround though, because in reality there is no way you could ambiguously resolve function declaration and call (first requires
:=
, second forbids existence of:=
).
That's true; the grammar is not ambiguous. But unambiguous is not enough to build a parser.
The problem is that the parsing decision needs to be made before the )
is shifted, and at that point the (possible) :=
is not yet visible. The decision that needs to be made is precisely whether to reduce the x
to an expression
; that reduction would be needed if the eventual result will be a function call and incorrect if the eventual result will be a function assignment. And remember that the action needs to be decided on the basis of just one lookahead token, so the fact that the :=
(if it exists) is two tokens a way is critical.
Indeed, the simplest solution is the one you suggest; accept arguments
instead of parameters
and then check in the reduction for functionAssignment
(or earlier, if you prefer, by splitting functionAssignment
into something like templatePrototype + assign + expression
. It's reasonable to call that a workaround; it's a workaround for the fact that the simplest grammar is LALR(2), not LALR(1).
There are other alternatives which can produce a more precise grammar. An LALR(2) grammar can always be used to produce a (much larger) LALR(1) grammar. But the resulting grammar is much less readable (and harder to maintain), and the check in the semantic action is quite simple.
Another workaround which is sometimes recommended is to modify the lexer so that a )
followed by :=
produces a different token than )
not followed by :=
. That keeps the grammar simple, but complicates the lexer. (And remember that there might be a comment between the two tokens, so it's not just a simple regular expression.)
If you really want to see how this could be resolved, you can take a look at these answers to somewhat similar questions:
Searching on this site for LALR(2)
will probably find more instances.