Search code examples
c#sql-serverlinqlinq-to-entitiessql-server-2014

forcing LINQ-to-Entities + SQL Server 2014 to conditionally run a subquery


Is there a way to convince LINQ-to-Entities (against SQL Server 2014) to run a subquery only if a particular condition is true? In our current code, the SQL emitted by LINQ will run the subquery in all cases, even if only a small number of rows satisfy the condition.

What I want is SQL that looks like this pseudo-SQL:

SELECT CASE WHEN condition THEN (subquery SQL) ELSE NULL END

But what LINQ generates is this pseudo-SQL:

CASE WHEN condition THEN sub.results ELSE NULL END
.... (more SQL here)
OUTER APPLY (
    subquery SQL
) sub

Here's a simplified version of the subquery in C#:

let lastSale = storeReportSettings.ShowRunningTotal
                    ? salesTable
                        .Where(arg =>
                            arg.StoreId == Stores.StoreId &&
                            arg.SaleDate <= endDate)
                        .OrderByDescending(arg => arg.SaleDate)
                        .Select(arg => arg.RunningTotal)
                        .FirstOrDefault()
                    : (int?)null

In the example above, reportSettings.ShowRunningTotal is false for most stores and true for a few of them. The subquery to fetch the running total is expensive.

So the goal is to avoid running that subquery except for rows that need it.

But the LINQ above generates SQL that looks like this:

SELECT
... (lots of SQL here) ...

    CASE WHEN ([Filter1].[ShowRunningTotal] = 1) THEN [Limit1].[RunningTotal] END AS [C1], 

... (lots more SQL here) ...

    OUTER APPLY  (SELECT TOP (1) [Project1].[RunningTotal] AS [RunningTotal]
        FROM ( SELECT 
            [Extent13].[RunningTotal] AS [RunningTotal], 
            [Extent13].[SaleDate] AS [SaleDate]
            FROM [dbo].[Sales] AS [Extent13]
            WHERE ([Extent13].[StoreID] = [Filter1].[StoreId1]) 
                AND ([Extent13].[SaleDate] <= @p__linq__1) 
        )  AS [Project1]
        ORDER BY [Project1].[SaleDate] DESC ) AS [Limit1]

Instead of preventing the subquery from running for rows where the condition is false, this SQL runs the subquery for every row. Then later after the I/O damage is done, it filters out rows that don't need a running total.

What I wanted was SQL like this:

OUTER APPLY  (
     SELECT CASE WHEN ([Filter1].[ShowRunningTotal] = 0) THEN NULL ELSE
     (
         SELECT TOP (1) [Project1].[RunningTotal] AS [RunningTotal]
            FROM ( SELECT 
                [Extent13].[RunningTotal] AS [RunningTotal], 
                [Extent13].[SaleDate] AS [SaleDate]
                FROM [dbo].[Sales] AS [Extent13]
                WHERE ([Extent13].[StoreID] = [Filter1].[StoreId1])
                    AND ([Extent13].[SaleDate] <= @p__linq__1) 
            )  AS [Project1]
            ORDER BY [Project1].[SaleDate] DESC 
    ) END AS [RunningTotal]
) AS [Limit1]

How can I change my LINQ query to emit SQL like above, or any other SQL that will always avoid running the subquery if the condition is false?

Obviously I could split this into two LINQ queries-- one for rows that need running totals and another for the rest-- and union the results. But this involves a lot of refactoring that I'd prefer to avoid.

I tried removing the conditional operator and pushing the test into the Where() instead. But although this approach might be helpful in other cases, it wasn't helpful here because SQL Server still runs an inefficient index seek on the covering index that this subquery relies on. (It's ON (SaleDate, StoreId) INCLUDE (RunningTotal) which makes looking for a specific store very expensive because SQL has to read years of rows.) This seems to be a SQL Server issue, where predicate pushdown isn't smart enough to avoid the index seek. Unfortunately I can't add a better covering index-- the underlying table's indexes are not easy to modify for various reasons.

I could convert the subquery from a let (aka OUTER APPLY) into a left outer join, but that would involve a fairly complex refactor, and I'd be worried that that might end up with a less predictable query plan in some cases that could yield worse performance.

So what I'd really prefer is a way to convince LINQ to emit the subquery inside of a CASE (or a functionally equivalent result), rather than outside the CASE. Any suggestions?


Solution

  • Just embed the condition in the subquery to make it short-circuit for the cases where ShowRunningTotal = false. eg

    using System;
    using System.Collections.Generic;
    using System.ComponentModel.DataAnnotations.Schema;
    using System.Data.Entity;
    using System.Data.SqlClient;
    using System.Linq;
    using System.Threading;
    
    namespace ConsoleApp6
    {
    
    
        [Table("Customers")]
        public class Customer
        {
            public int CustomerID { get; set; }
    
            public string Name { get; set; }
    
            public bool ShowRunningTotal { get; set; }
    
            public ICollection<SalesOrders> Orders { get; } = new HashSet<SalesOrders>();
        }
    
        public class SalesOrders
        {
            public int Id { get; set; }
            public float Amount { get; set; }
    
            public DateTime SaleDate { get; set; }
    
            public int CustomerId { get; set; }
            virtual public Customer Customer { get; set; }
        }
        class Db : DbContext
        {
    
            public DbSet<Customer> Customers { get; set; }
    
            public DbSet<SalesOrders> SalesOrders { get; set; }
        }
        class Program
        {
            static void Main()
            {
    
                Database.SetInitializer(new DropCreateDatabaseAlways<Db>());
    
                using (var db = new Db())
                {
                    var q = from c in db.Customers
                            select new
                            {
                                c.CustomerID,
                                LastSale = c.Orders
                                            .Where(o => c.ShowRunningTotal)
                                            .OrderByDescending(o => o.SaleDate)
                                            .FirstOrDefault()
                            };
    
                    Console.WriteLine(q.ToString());
                }
    
                Console.ReadKey();
            }
    
    
        }
    }
    

    Translates to

    SELECT
        [Extent1].[CustomerID] AS [CustomerID],
        [Limit1].[Id] AS [Id],
        [Limit1].[Amount] AS [Amount],
        [Limit1].[SaleDate] AS [SaleDate],
        [Limit1].[CustomerId] AS [CustomerID1]
        FROM  [dbo].[Customers] AS [Extent1]
        OUTER APPLY  (SELECT TOP (1) [Project1].[Id] AS [Id], [Project1].[Amount] AS [Amount], [Project1].[SaleDate] AS [SaleDate], [Project1].[CustomerId] AS [CustomerId]
            FROM ( SELECT
                [Extent2].[Id] AS [Id],
                [Extent2].[Amount] AS [Amount],
                [Extent2].[SaleDate] AS [SaleDate],
                [Extent2].[CustomerId] AS [CustomerId]
                FROM [dbo].[SalesOrders] AS [Extent2]
                WHERE ([Extent1].[CustomerID] = [Extent2].[CustomerId]) AND ([Extent1].[ShowRunningTotal] = 1)
            )  AS [Project1]
            ORDER BY [Project1].[SaleDate] DESC ) AS [Limit1]