Search code examples
c#parsingsprache

Sprache: left recursion in grammar


I am developing a parser for a language similar to SQL and I have the problem of creating some of the rules of language, such as: expression IS NULL and expression IN (expression1, expression2, ...) with priority between logical and mathematical operators.

I uploaded a GitHub test project https://github.com/anpv/SpracheTest/ but this variant is not good.
I tried to use the following rules:

private static readonly Parser<AstNode> InOperator =
    from expr in Parse.Ref(() => Expression)
    from inKeyword in Parse.IgnoreCase("in").Token()
    from values in Parse
        .Ref(() => Expression)
        .DelimitedBy(Comma)
        .Contained(OpenParenthesis, CloseParenthesis)
    select new InOperator(expr, values);

private static readonly Parser<AstNode> IsNullOperator =
    from expr in Parse.Ref(() => Expression)
    from isNullKeyword in Parse
         .IgnoreCase("is")
         .Then(_ => Parse.WhiteSpace.AtLeastOnce())
         .Then(_ => Parse.IgnoreCase("null"))
    select new IsNullOperator(expr);

private static readonly Parser<AstNode> Equality =
    Parse
        .ChainOperator(Eq, IsNullOperator.Or(InOperator).Or(Additive), MakeBinary);

which throws ParseException in code like ScriptParser.ParseExpression("1 is null") or ScriptParser.ParseExpression("1 in (1, 2, 3)"): "Parsing failure: Left recursion in the grammar.".

How can I look-ahead for Expression, or do other variants exist to solve this problem?


Solution

  • The answer is, unfortunately, the Sprache cannot parse a left-recursive grammar. I stumbled on comments in the source code talking about how buggy support for left-recursive grammars had been removed when researching this question (which was also how I found your question) - see the source code.

    In order to deal with this problem you need to reorganize how you do your parsing. If you are writing a simple expression parser, for example, this is a common problem you have to deal with. Searching the web there is lots of discussion of how to remove left-recursion from a grammar, in particular, for expressions.

    In your case, I expect you'll need to do something like:

    term := everything simple in an expression (like "1", "2", "3", etc.)
    expression := term [ IN ( expression*) | IS NULL | "+" expression | "-" expression | etc.]
    

    or similar - basically - you have to unwind the recursion yourself. By doing that I was able to fix my issues with expressions. I suspect any basic compiler book probably has a section on how to "normalize" a grammar.

    It makes building whatever object you are returning from the parser a bit more of a pain, but in the select statement instead of doing "select new Expression(arg1, arg2)" I changed it to be a function call, and the function decides on the specific object being returned depending on what the arguments were.