Search code examples
sql-serverperformancesql-execution-plantempdb

Why is Query Optimizer Underestimating and Causing Data Spill to tempdb?


I have a very simple query that I am testing by running ad-hoc. The execution plan indicates that data is spilling into tempdb because of poor estimates.

I have been unable to resolve any of the three occurrences of the warning:

Operator used tempdb to spill data during execution with spill level 1 and 1 spilled thread(s)

All tables have covering indexes and up-to-date statistics.

I have tried using temp tables, hard coding instead of variables, re-ordering the WHERE clause, and even eliminating the WHERE clause by adding the filtering conditions to the JOINs.

I cleared the specific query plan after each execution by using the plan handle in DBCC FREEPROCCACHE, but each iteration generates the identical plan.

The query returns 10,567 distinct records, so it is not terribly large.

Is there any advice on how I can avoid the data spill?

Here is a link to the query plan brentozar.com/pastetheplan/?id=HJ1v8G03C

DECLARE
    @StartDate  datetime = '01/01/2023',
    @EndDate    datetime = '12/31/2023 23:59:59';

SELECT
    c.DonorTracCaseID,
    optn.Number AS OPTN,
    ti.OfferDateTime
FROM
    dbo.[Case] c
    INNER JOIN dbo.TriageReferral tr ON c.ID = tr.CaseID 
    INNER JOIN dbo.TriageImport ti ON tr.ID = ti.ID
    INNER JOIN PotentialDonor.DonorNumber optn ON c.DonorTracCaseID = optn.DonorTracCaseId
    INNER JOIN [Admin].ConfigureDonorNumber cdn ON optn.ConfigureDonorNumberId = cdn.ConfigureDonorNumberId
WHERE
    tr.IsDuplicate = 0
AND ti.OfferDateTime >= @StartDate
AND ti.OfferDateTime <= @EndDate
AND cdn.NumberDescription = 'OPTN';

Solution

  • There doesn't seem to be a lot you can do. The actual incorrect cardinality estimate is coming from DonorNumber, probably due to skewed data.

    • It may be worth adding the following indexes, or modifying existing ones. Note that they are multi-column indexes, as you appear to have a number of single-column indexes, which aren't that useful.

      PotentialDonor.DonorNumber (ConfigureDonorNumberId, DonorTracCaseId) INCLUDE (Number)
      dbo.TriageReferral (IsDuplicate, CaseID)
      
    • A filtered index or statistic would normally be the solution for the bad estimate on DonorNumber, but is not an option because the actual lookup value comes from a unique join on Admin.ConfigureDonorNumber. So an indexed view may be the only way around that.

      CREATE OR ALTER VIEW [Admin].v_Potential_ConfigureDonorNumber
      WITH SCHEMABINDING
      AS
      SELECT
        optn.DonorTracCaseId,
        optn.Number
      FROM PotentialDonor.DonorNumber optn
      INNER JOIN [Admin].ConfigureDonorNumber cdn ON optn.ConfigureDonorNumberId = cdn.ConfigureDonorNumberId
      WHERE cdn.NumberDescription = 'OPTN'
      
      CREATE UNIQUE CLUSTERED INDEX CX ON [Admin].v_Potential_ConfigureDonorNumber (DonorTracCaseId)
      

      Then change your query's final two joins to just (note the use of NOEXPAND)

      INNER JOIN [Admin].v_Potential_ConfigureDonorNumber optn WITH (NOEXPAND) ON c.DonorTracCaseID = optn.DonorTracCaseId
      
    • Your one final option is to just throw in the towel and force the server to allocate more memory, with this at the end of your query:

      OPTION (MIN_GRANT_PERCENT = 1);
      

      Increase the percentage until the problem goes away.