Search code examples
c#parsingsprache

Recursive expression parsing in Sprache


I am building a Sprache parser to parse expressions similar to SQL search conditions. e.g Property = 123 or Property > AnotherProperty

So far both of those examples work, however I am struggling to figure out what I need to do to allow ANDing/ORing conditions and parenthesis.

Basically I have this so far:

private static readonly Parser<string> Operators =
    Parse.String("+").Or(Parse.String("-")).Or(Parse.String("="))
        .Or(Parse.String("<")).Or(Parse.String(">"))
        .Or(Parse.String("<=")).Or(Parse.String(">=")).Or(Parse.String("<>"))
        .Text();

private static readonly Parser<IdentifierExpression> Identifier = 
    from first in Parse.Letter.Once()
    from rest in Parse.LetterOrDigit.Many()
    select new IdentifierExpression(first.Concat(rest).ToArray());

public static readonly Parser<Expression> Integer =
    Parse.Number.Select(n => new IntegerExpression {Value = int.Parse(n)});

public static readonly Parser<SearchCondition> SearchCondition = 
    from left in Identifier.Or(Number)
    from op in Operators.Token()
    from right in Identifier.Or(Number)
    select new SearchCondition { Left = left, Right = right, Operator = op};

This works for the simple cases above, but now I need a pointer on how to implement conditions like:

  • PropertyX = PropertyY OR PropertyX = PropertyZ
  • PropertyA > PropertyB AND (OtherAnotherProperty = 72 OR OtherAnotherProperty = 150)

Can anyone give me an idea on how to structure the parsers for this sort of thing?


Solution

  • What you have so far is a basic comparison expression parser. It looks like you want to wrap that in a parser that handles logical expressions (and, or, etc.) with sub-expression support.

    The code I posted at first was ripped from poorly-tested code I was still working on, which didn't handle statements with multiple terms. My understanding of the ChainOperator method was clearly incomplete.

    Parse.ChainOperator is the method that lets you specify your operators and have them appear 0-to-many times in the expression. I was making assumptions about how it worked that turned out to be just wrong.

    I've rewritten the code and added a few bits to make it easier to use:

    // Helpers to make access simpler
    public static class Condition
    {
        // For testing, will fail all variable references
        public static Expression<Func<object, bool>> Parse(string text)
            => ConditionParser<object>.ParseCondition(text);
    
        public static Expression<Func<T, bool>> Parse<T>(string text)
            => ConditionParser<T>.ParseCondition(text);
    
        public static Expression<Func<T, bool>> Parse<T>(string text, T instance)
            => ConditionParser<T>.ParseCondition(text);
    }
    
    public static class ConditionParser<T>
    {
        static ParameterExpression Parm = Expression.Parameter(typeof(T), "_");
    
        public static Expression<Func<T, bool>> ParseCondition(string text)
            => Lambda.Parse(text);
    
        static Parser<Expression<Func<T, bool>>> Lambda =>
            OrTerm.End().Select(body => Expression.Lambda<Func<T, bool>>(body, Parm));
    
        // lowest priority first
        static Parser<Expression> OrTerm =>
            Parse.ChainOperator(OpOr, AndTerm, Expression.MakeBinary);
    
        static Parser<ExpressionType> OpOr = MakeOperator("or", ExpressionType.OrElse);
    
        static Parser<Expression> AndTerm =>
            Parse.ChainOperator(OpAnd, NegateTerm, Expression.MakeBinary);
    
        static Parser<ExpressionType> OpAnd = MakeOperator("and", ExpressionType.AndAlso);
    
        static Parser<Expression> NegateTerm =>
            NegatedFactor
            .Or(Factor);
    
        static Parser<Expression> NegatedFactor =>
            from negate in Parse.IgnoreCase("not").Token()
            from expr in Factor
            select Expression.Not(expr);
    
        static Parser<Expression> Factor =>
            SubExpression
            .Or(BooleanLiteral)
            .Or(BooleanVariable);
    
        static Parser<Expression> SubExpression =>
            from lparen in Parse.Char('(').Token()
            from expr in OrTerm
            from rparen in Parse.Char(')').Token()
            select expr;
    
        static Parser<Expression> BooleanValue =>
            BooleanLiteral
            .Or(BooleanVariable);
    
        static Parser<Expression> BooleanLiteral =>
            Parse.IgnoreCase("true").Or(Parse.IgnoreCase("false"))
            .Text().Token()
            .Select(value => Expression.Constant(bool.Parse(value)));
    
        static Parser<Expression> BooleanVariable =>
            Parse.Regex(@"[A-Za-z_][A-Za-z_\d]*").Token()
            .Select(name => VariableAccess<bool>(name));
    
        static Expression VariableAccess<TTarget>(string name)
        {
            MemberInfo mi = typeof(T).GetMember(name, MemberTypes.Field | MemberTypes.Property, BindingFlags.Instance | BindingFlags.Public).FirstOrDefault();
            var targetType = typeof(TTarget);
            var type = 
                (mi is FieldInfo fi) ? fi.FieldType : 
                (mi is PropertyInfo pi) ? pi.PropertyType : 
                throw new ParseException($"Variable '{name}' not found.");
    
            if (type != targetType)
                throw new ParseException($"Variable '{name}' is type '{type.Name}', expected '{targetType.Name}'");
    
            return Expression.MakeMemberAccess(Parm, mi);
        }
    
        // Helper: define an operator parser
        static Parser<ExpressionType> MakeOperator(string token, ExpressionType type)
            => Parse.IgnoreCase(token).Token().Return(type);
    }
    

    And some examples:

    static class Program
    {
        static void Main()
        {
            // Parser with no input
            var condition1 = Condition.Parse("true and false or true");
            Console.WriteLine(condition1.ToString());
            var fn1 = condition1.Compile();
            Console.WriteLine("\t={0}", fn1(null));
    
            // Parser with record input
            var record1 = new { a = true, b = false };
            var record2 = new { a = false, b = true };
            var condition2 = Condition.Parse("a and b or not a", record);
            Console.WriteLine(condition2.ToString());
            var fn2 = condition2.Compile();
            Console.WriteLine("\t{0} => {1}", record1.ToString(), fn2(record1));
            Console.WriteLine("\t{0} => {1}", record2.ToString(), fn2(record2));
        }
    }
    

    You will still need to add your own parsers for handling comparison expressions and so on. Plug those into the BooleanValue parser after the existing terms:

    static Parser<Expression> BooleanValue => 
        BooleanLiteral
        .Or(BooleanVariable)
        .Or(SearchCondition);
    

    I'm doing something similar with a more C#-style filter specification with type checking during the parse phase and separate parsers for strings vs numbers.