I am developing a parser for a language similar to SQL and I have the problem of creating some of the rules of language, such as: expression IS NULL
and expression IN (expression1, expression2, ...)
with priority between logical and mathematical operators.
I uploaded a GitHub test project https://github.com/anpv/SpracheTest/ but this variant is not good.
I tried to use the following rules:
private static readonly Parser<AstNode> InOperator =
from expr in Parse.Ref(() => Expression)
from inKeyword in Parse.IgnoreCase("in").Token()
from values in Parse
.Ref(() => Expression)
.DelimitedBy(Comma)
.Contained(OpenParenthesis, CloseParenthesis)
select new InOperator(expr, values);
private static readonly Parser<AstNode> IsNullOperator =
from expr in Parse.Ref(() => Expression)
from isNullKeyword in Parse
.IgnoreCase("is")
.Then(_ => Parse.WhiteSpace.AtLeastOnce())
.Then(_ => Parse.IgnoreCase("null"))
select new IsNullOperator(expr);
private static readonly Parser<AstNode> Equality =
Parse
.ChainOperator(Eq, IsNullOperator.Or(InOperator).Or(Additive), MakeBinary);
which throws ParseException
in code like ScriptParser.ParseExpression("1 is null")
or ScriptParser.ParseExpression("1 in (1, 2, 3)"): "Parsing failure: Left recursion in the grammar."
.
How can I look-ahead for Expression, or do other variants exist to solve this problem?
The answer is, unfortunately, the Sprache cannot parse a left-recursive grammar. I stumbled on comments in the source code talking about how buggy support for left-recursive grammars had been removed when researching this question (which was also how I found your question) - see the source code.
In order to deal with this problem you need to reorganize how you do your parsing. If you are writing a simple expression parser, for example, this is a common problem you have to deal with. Searching the web there is lots of discussion of how to remove left-recursion from a grammar, in particular, for expressions.
In your case, I expect you'll need to do something like:
term := everything simple in an expression (like "1", "2", "3", etc.)
expression := term [ IN ( expression*) | IS NULL | "+" expression | "-" expression | etc.]
or similar - basically - you have to unwind the recursion yourself. By doing that I was able to fix my issues with expressions. I suspect any basic compiler book probably has a section on how to "normalize" a grammar.
It makes building whatever object you are returning from the parser a bit more of a pain, but in the select statement instead of doing "select new Expression(arg1, arg2)" I changed it to be a function call, and the function decides on the specific object being returned depending on what the arguments were.