How to improve performance of CAST IIF EXISTS in large data sets

I'm currently working on a project with a large dataset in Microsoft SQL Server:

    Table_A ±250k rows;
    Table_B ±1M rows;
    Table_C ±6M rows;
    Table_D ±10M rows;

Currently I run into performance issues to get data out of these tables with a Table Valued Function.

On average, my query runs for 1,5s, which is too slow as we need to run 4 of them to run a web page. It get's worse when the database is under heavy load by multiple users, then the speed can drop to 6 second, causing total load times of 25s or longer of a webpage which is too slow.

I checked the execution plan, and found out that 70% of calculation is going into the CAST(IFF(EXISTS(...))) part:

    WITH cte1 AS (
        SELECT B.ID, B.NAME, B.A_Id
        FROM Table_B
        INNER JOIN (
            SELECT B.A_Id, MAX(B.ID) AS Max_ID
            FROM Table_B B
            GROUP BY B.A_Id
        ) Q ON Q.A_Id = B.A_ID AND Q.Max_ID = B.ID
    ),
    SELECT  
        A.Id,
        A.<columns>,
    
        IsBusy= CAST(IIF(       
        EXISTS(
            SELECT C.ID
            FROM Table_C C
            INNER JOIN Table_D D 
                ON D.ID = C.D_ID AND C.B_ID = cte1.ID
           WHERE D.FinishDate IS NULL 
        ) , 1, 0) AS BIT),

        IsError= CAST(IIF (     
        EXISTS(
            SELECT C.ID
            FROM Table_C C
            INNER JOIN Table_D D 
                ON D.ID = C.D_ID AND C.B_ID = cte1.ID
            WHERE D.FinishDate IS NOT NULL AND LEN(D.ErrorMessage) > 0
        ) , 1, 0) AS BIT)

    FROM Table_A
    LEFT OUTER JOIN cte cte1 ON Table_A.Id = cte1.Id
    -- Other LEFT OUTER JOINS + WHERE statements are omitted for readability.

My question: Why is so much time wasted on this CAST(IFF(EXISTS(..))) construction? And what direction can I think of to improve performance?

I've never bumped really in such big tables that caused these kind of queries to perform bad.

Solution

It's hard to help without better info on your query, such as tables, indexes and a query plan.

But you can probably combine these two subqueries using conditional aggregation, inside an OUTER APPLY or a grouped LEFT JOIN.

You can also improve the CTE by using ROW_NUMBER or another window function.

WITH cte1 AS (
    SELECT
      B.ID,
      B.NAME,
      B.A_Id,
    FROM (
        SELECT *,
          ROW_NUMBER() OVER (PARTITION BY B.A_Id ORDER BY B.ID DESC) AS rn
        FROM Table_B
    ) B
    WHERE B.rn = 1
)
SELECT  
  A.Id,
  A.<columns>,
    
  d.IsBusy,
  d.IsError
FROM Table_A a
LEFT OUTER JOIN cte cte1 ON a.Id = cte1.Id
OUTER APPLY (
    SELECT
      ISNULL(MAX(CASE WHEN D.FinishDate IS NULL                                 THEN CAST(1 AS bit) END), 0) AS IsBusy,
      ISNULL(MAX(CASE WHEN D.FinishDate IS NOT NULL AND LEN(D.ErrorMessage) > 0 THEN CAST(1 AS bit) END), 0) AS IsError
    FROM Table_C C
    JOIN Table_D D 
      ON D.ID = C.D_ID
     AND C.B_ID = cte1.ID
) d

-- Other LEFT OUTER JOINS + WHERE statements are omitted for readability.

Or using a grouped LEFT JOIN

LEFT JOIN (
    SELECT
      C.B_ID,
      ISNULL(MAX(CASE WHEN D.FinishDate IS NULL                                 THEN CAST(1 AS bit) END), 0) AS IsBusy,
      ISNULL(MAX(CASE WHEN D.FinishDate IS NOT NULL AND LEN(D.ErrorMessage) > 0 THEN CAST(1 AS bit) END), 0) AS IsError
    FROM Table_C C
    JOIN Table_D D 
      ON D.ID = C.D_ID
    GROUP BY
      C.B_ID
) d
  ON d.B_ID = cte1.ID

I would expect something like the following indexes

Table_A (ID) INCLUDE (otherColumnsHere)

Table_B (A_Id, ID DESC) INCLUDE (NAME)

Table_C (B_ID, D_ID)

Table_D (ID) INCLUDE (FinishDate, ErrorMessage)

Obviously these can be clustered indexes, especially if they are primary keys. In which case you don't need INCLUDE.