Search code examples
sqlsql-serverquery-optimization

How to improve performance of CAST IIF EXISTS in large data sets


I'm currently working on a project with a large dataset in Microsoft SQL Server:

    Table_A ±250k rows;
    Table_B ±1M rows;
    Table_C ±6M rows;
    Table_D ±10M rows;

Currently I run into performance issues to get data out of these tables with a Table Valued Function.

On average, my query runs for 1,5s, which is too slow as we need to run 4 of them to run a web page. It get's worse when the database is under heavy load by multiple users, then the speed can drop to 6 second, causing total load times of 25s or longer of a webpage which is too slow.

I checked the execution plan, and found out that 70% of calculation is going into the CAST(IFF(EXISTS(...))) part:

    WITH cte1 AS (
        SELECT B.ID, B.NAME, B.A_Id
        FROM Table_B
        INNER JOIN (
            SELECT B.A_Id, MAX(B.ID) AS Max_ID
            FROM Table_B B
            GROUP BY B.A_Id
        ) Q ON Q.A_Id = B.A_ID AND Q.Max_ID = B.ID
    ),
    SELECT  
        A.Id,
        A.<columns>,
    
        IsBusy= CAST(IIF(       
        EXISTS(
            SELECT C.ID
            FROM Table_C C
            INNER JOIN Table_D D 
                ON D.ID = C.D_ID AND C.B_ID = cte1.ID
           WHERE D.FinishDate IS NULL 
        ) , 1, 0) AS BIT),

        IsError= CAST(IIF (     
        EXISTS(
            SELECT C.ID
            FROM Table_C C
            INNER JOIN Table_D D 
                ON D.ID = C.D_ID AND C.B_ID = cte1.ID
            WHERE D.FinishDate IS NOT NULL AND LEN(D.ErrorMessage) > 0
        ) , 1, 0) AS BIT)

    FROM Table_A
    LEFT OUTER JOIN cte cte1 ON Table_A.Id = cte1.Id
    -- Other LEFT OUTER JOINS + WHERE statements are omitted for readability.

My question: Why is so much time wasted on this CAST(IFF(EXISTS(..))) construction? And what direction can I think of to improve performance?

I've never bumped really in such big tables that caused these kind of queries to perform bad.


Solution

  • It's hard to help without better info on your query, such as tables, indexes and a query plan.

    But you can probably combine these two subqueries using conditional aggregation, inside an OUTER APPLY or a grouped LEFT JOIN.

    You can also improve the CTE by using ROW_NUMBER or another window function.

    WITH cte1 AS (
        SELECT
          B.ID,
          B.NAME,
          B.A_Id,
        FROM (
            SELECT *,
              ROW_NUMBER() OVER (PARTITION BY B.A_Id ORDER BY B.ID DESC) AS rn
            FROM Table_B
        ) B
        WHERE B.rn = 1
    )
    SELECT  
      A.Id,
      A.<columns>,
        
      d.IsBusy,
      d.IsError
    FROM Table_A a
    LEFT OUTER JOIN cte cte1 ON a.Id = cte1.Id
    OUTER APPLY (
        SELECT
          ISNULL(MAX(CASE WHEN D.FinishDate IS NULL                                 THEN CAST(1 AS bit) END), 0) AS IsBusy,
          ISNULL(MAX(CASE WHEN D.FinishDate IS NOT NULL AND LEN(D.ErrorMessage) > 0 THEN CAST(1 AS bit) END), 0) AS IsError
        FROM Table_C C
        JOIN Table_D D 
          ON D.ID = C.D_ID
         AND C.B_ID = cte1.ID
    ) d
    
    -- Other LEFT OUTER JOINS + WHERE statements are omitted for readability.
    

    Or using a grouped LEFT JOIN

    LEFT JOIN (
        SELECT
          C.B_ID,
          ISNULL(MAX(CASE WHEN D.FinishDate IS NULL                                 THEN CAST(1 AS bit) END), 0) AS IsBusy,
          ISNULL(MAX(CASE WHEN D.FinishDate IS NOT NULL AND LEN(D.ErrorMessage) > 0 THEN CAST(1 AS bit) END), 0) AS IsError
        FROM Table_C C
        JOIN Table_D D 
          ON D.ID = C.D_ID
        GROUP BY
          C.B_ID
    ) d
      ON d.B_ID = cte1.ID
    

    I would expect something like the following indexes

    Table_A (ID) INCLUDE (otherColumnsHere)
    
    Table_B (A_Id, ID DESC) INCLUDE (NAME)
    
    Table_C (B_ID, D_ID)
    
    Table_D (ID) INCLUDE (FinishDate, ErrorMessage)
    

    Obviously these can be clustered indexes, especially if they are primary keys. In which case you don't need INCLUDE.