Search code examples
c#sql-serverlinqt-sqlexpression

Generic expression to find partial string matches against dynamic list of fields in LINQ query


As I understand it, in particular with SQL Server:

  • LINQ queries convert the query code into T-SQL to execute against the SQL Server.
  • LINQ code logic doesn't always seamlessly convert to T-SQL, which usually needs additional steps to make code logic to work. Example: Adding .ToList() to a context table call will pull all records into memory that the rest of the code generating T-SQL can query against.

This last point is where I am needing help with. Let me start with the goal I am trying to reach. I want to be able to create a generic method that can be called from a LINQ query, passing in a list of table fields, and a list of string values, that will return true or false if any of the string values are partially found in any of the table fields. I have the following method that I use that currently works:

    public static bool SearchTermMatch(List<string> t, List<string> f)
    {
        return
        (
            (t.Count() == 0) ||
            (t.Count(_t => f.Any(_f => _f != null && _f.ToLower().Contains(_t.ToLower())) || _t == "") == t.Count())
        );
    }

I can call it like this, for example:

var terms = "Tom Jones";    
var termlist = (string.IsNullOrEmpty(terms)) ? new List<string>() : terms.Split(' ').ToList();
var results = (from tbl in context.MyTable.ToList() where SearchTermMatch(termlist, new List<string>() { tbl.Field1, tbl.Field2, tbl.Field3 }) select new { tbl.Field1 }).Take(10).ToList();

As I've mentioned above, this works, but only because I have to apply .ToList() to the context table. However, there is a cost that I don't want to absorb the way this is working. Basically all the records in MyTable are pulled down into memory, and the rest of the SearchTermMatch method is being applied to the results in memory. This becomes very expensive when MyTable has hundreds of thousands of records, because for every record in MyTable, SearchTermMatch is comparing the VALUES of Field1, Field2, and Field3 against the list of strings I'm looking for partial matches from. So in essence, T-SQL is performing quickly by passing back ALL the records in MyTable to the server's memory, where the rest of LINQ is processing the resulting field values for each record against the list of strings from the server's memory. This is inefficient.

I believe the answer is to convert the SearchTermMatch method into an Expression that will create the table fields being compared to the list of strings into T-SQL that can be executed with the MyTable query. Basically the goal is to have SearchTermMatch method pass back T-SQL to the LINQ query so I can get rid of the memory-inducing .ToList() addendum. I just want my LINQ to be fully executed on the SQL Server.

Does anyone have any suggestions on how I should approach this? I have been trying to wrap my brain around the concept of Expressions and feel that's the direction I need to go in, but I'm having a hard time figuring out how to put together an alternative SearchTermMatch method as that.

Thank you for any help you can provide.

UPDATE:

The answer provided by Servy is amazing and does indeed solve my particular situation. I originally posted a scaled down version of my overall query so I can get the fundamentals of an answer to further apply to the more complex query I need to apply those fundamentals to. Not only does Servy's answer solve my basic query issue, but it also cleanly works when I apply more complexity to the query. I'll illustrate some of this.

Servy posts his Expression-based SearchTermMatch method in an example that is directly applied to the context database table directly. In my particular need I can't use LINQ in that way (if someone can explain to me how to use the correct terminology in describing LINQ code [from t in context.MyTable where t.Field1 == ""] as oppose to [context.MyTable.Where(t => t.Field1 == "")] I'd appreciate it).

In Servy's example usage he posts:

var query = context.MyTable.SearchTermMatch(termlist, f => f.Field1, f => f.Field2, f => f.Field3)
    .Select(table => table.Field1)
    .Take(10);

This alteratively works as:

from t in context.MyTable
select new MyTableModel()
{
    Prop_Field1 = t.Field1,
    Prop_Field2 = t.Field2,
    Prop_Field3 = t.Field3
}).SearchTermMatch(termlist, t => t.Prop_Field1, t => t.Prop_Field2, t => t.Prop_Field3).Take(10).ToList();

Where "MyTableModel" is a C# class model to contain these records.

The reason I need it in this format is my query end result needs to be transitioned into a model list.

Another complexity I need to account for is passing in fields from a joined table into the SearchTermMatch Expresssion, and that is also accounted for:

from t in context.MyTable
join t2 in context.MyTable2 on t.Field1 equals t2.Field1
select new MyTableModel()
{
    Prop_Field1 = t.Field1,
    Prop_Field2 = t.Field2,
    Prop_Field3 = t.Field3,
    Prop_Field4 = t2.Field4
}).SearchTermMatch(termlist, t => t.Prop_Field1, t => t.Prop_Field2, t => t.Prop_Field3, t => t.Prop_Field4).Take(10).ToList();

I imagine this would also work in Servy's example usage by adding a joined table via .Join().

Thank you Servy for providing a very concise answer, and I hope this helps others that have the same kind of question.


Solution

  • So to start out with, there are a few building blocks we'll want to build up to give ourselves some better tools for manipulating expressions before we get to the business logic.

    First we'll want a method to replace the parameter of one expression with another:

    public static Expression ReplaceParameter(this Expression expression,
        ParameterExpression toReplace,
        Expression newExpression)
    {
        return new ParameterReplaceVisitor(toReplace, newExpression)
            .Visit(expression);
    }
    

    This will require an expression visitor to replace a particular parameter with another:

    public class ParameterReplaceVisitor : ExpressionVisitor
    {
        private ParameterExpression from;
        private Expression to;
        public ParameterReplaceVisitor(ParameterExpression from, Expression to)
        {
            this.from = from;
            this.to = to;
        }
        protected override Expression VisitParameter(ParameterExpression node)
        {
            return node == from ? to : base.Visit(node);
        }
    }
    

    Next we want a method to Compose expressions so that we can take an expression, and then another that uses it's result, and combine them into a single expression that performs both operations together:

    public static Expression<Func<TSource, TResult>> Compose<TSource, TIntermediate, TResult>(
        this Expression<Func<TSource, TIntermediate>> first,
        Expression<Func<TIntermediate, TResult>> second)
    {
        var param = Expression.Parameter(typeof(TSource));
        var intermediateValue = first.Body.ReplaceParameter(first.Parameters[0], param);
        var body = second.Body.ReplaceParameter(second.Parameters[0], intermediateValue);
        return Expression.Lambda<Func<TSource, TResult>>(body, param);
    }
    

    Next we want a method to take a bunch of expressions and produce one that tells us if they are all true:

    public static Expression<Func<T, bool>> All<T>(this IEnumerable<Expression<Func<T, bool>>> predicates)
    {
        var param = Expression.Parameter(typeof(T));
        var body = predicates.Select(prediate => prediate.Body.ReplaceParameter(prediate.Parameters.First(), param))
            .Aggregate(Expression.AndAlso);
        return Expression.Lambda<Func<T, bool>>(body, param);
    }
    

    We could have used Compose for that, but that would result in a lot more work than replacing each parameter just once and only building one new lambda.

    We also need a method to tell us if Any of a sequence of predicates is true, and that's just All, but using OR instead of AND to combine them:

    public static Expression<Func<T, bool>> Any<T>(this IEnumerable<Expression<Func<T, bool>>> predicates)
    {
        var param = Expression.Parameter(typeof(T));
        var body = predicates.Select(prediate => prediate.Body.ReplaceParameter(prediate.Parameters.First(), param))
            .Aggregate(Expression.OrElse);
        return Expression.Lambda<Func<T, bool>>(body, param);
    }
    

    Okay, so that's it with the generic tools. Next, we'll need to adjust your actual method. An IQueryable isn't going to be able to manipulate a custom C# method of yours, you need to accept your terms as expressions and return a new expression that represents your own logic instead, so the signature will need to be notably different.

    Next, it's best to clean up the data and do the parameter validation at the start, and avoid trying to do any of it within the expression manipulation code at all.

    After that, to determine if any field contains a single term we can project each field into an expression telling us if that field contains the term, then combine those with our Any function from earlier.

    Then we just need to see if all of the terms are contained in any field.

    public static Expression<Func<T, bool>> SearchTermMatch<T>(IEnumerable<string> terms, params Expression<Func<T, string>>[] fields)
    {
        terms = terms.Where(term => term != "").ToList();
        if (!terms.Any()) return _ => true;
        if (!fields.Any()) return _ => false;
    
        Expression<Func<T, bool>> AnyFieldContainsTerm(string term) =>
            fields.Select(field => field.Compose(value => value.Contains(term)))
            .Any();
    
        return terms.Select(AnyFieldContainsTerm)
                    .All();
    }
    

    To pull it all together we just need to adjust how the method is called so that you call the method itself, rather than using it within another expression:

    var terms = "Tom Jones";
    var termlist = (string.IsNullOrEmpty(terms)) ? new List<string>() : terms.Split(' ').ToList();
    
    var query = context.MyTable.Where(SearchTermMatch<Foo>(termlist, f => f.Field1, f => f.Field2, f => f.Field3))
        .Select(table => table.Field1)
        .Take(10);
    

    You might also find it more convenient to make a version of the method acting on the IQueryable:

    public static IQueryable<T> SearchTermMatch<T>(this IQueryable<T> query, IEnumerable<string> terms, params Expression<Func<T, string>>[] fields)
    {
        return query.Where(SearchTermMatch(terms, fields));
    }
    

    As that now turns the query into:

    var query = context.MyTable.SearchTermMatch(termlist, f => f.Field1, f => f.Field2, f => f.Field3)
        .Select(table => table.Field1)
        .Take(10);