I'm currently working on a project with a large dataset in Microsoft SQL Server:
Table_A ±250k rows;
Table_B ±1M rows;
Table_C ±6M rows;
Table_D ±10M rows;
Currently I run into performance issues to get data out of these tables with a Table Valued Function.
On average, my query runs for 1,5s, which is too slow as we need to run 4 of them to run a web page. It get's worse when the database is under heavy load by multiple users, then the speed can drop to 6 second, causing total load times of 25s or longer of a webpage which is too slow.
I checked the execution plan, and found out that 70% of calculation is going into the CAST(IFF(EXISTS(...)))
part:
WITH cte1 AS (
SELECT B.ID, B.NAME, B.A_Id
FROM Table_B
INNER JOIN (
SELECT B.A_Id, MAX(B.ID) AS Max_ID
FROM Table_B B
GROUP BY B.A_Id
) Q ON Q.A_Id = B.A_ID AND Q.Max_ID = B.ID
),
SELECT
A.Id,
A.<columns>,
IsBusy= CAST(IIF(
EXISTS(
SELECT C.ID
FROM Table_C C
INNER JOIN Table_D D
ON D.ID = C.D_ID AND C.B_ID = cte1.ID
WHERE D.FinishDate IS NULL
) , 1, 0) AS BIT),
IsError= CAST(IIF (
EXISTS(
SELECT C.ID
FROM Table_C C
INNER JOIN Table_D D
ON D.ID = C.D_ID AND C.B_ID = cte1.ID
WHERE D.FinishDate IS NOT NULL AND LEN(D.ErrorMessage) > 0
) , 1, 0) AS BIT)
FROM Table_A
LEFT OUTER JOIN cte cte1 ON Table_A.Id = cte1.Id
-- Other LEFT OUTER JOINS + WHERE statements are omitted for readability.
My question:
Why is so much time wasted on this CAST(IFF(EXISTS(..)))
construction? And what direction can I think of to improve performance?
I've never bumped really in such big tables that caused these kind of queries to perform bad.
It's hard to help without better info on your query, such as tables, indexes and a query plan.
But you can probably combine these two subqueries using conditional aggregation, inside an OUTER APPLY
or a grouped LEFT JOIN
.
You can also improve the CTE by using ROW_NUMBER
or another window function.
WITH cte1 AS (
SELECT
B.ID,
B.NAME,
B.A_Id,
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY B.A_Id ORDER BY B.ID DESC) AS rn
FROM Table_B
) B
WHERE B.rn = 1
)
SELECT
A.Id,
A.<columns>,
d.IsBusy,
d.IsError
FROM Table_A a
LEFT OUTER JOIN cte cte1 ON a.Id = cte1.Id
OUTER APPLY (
SELECT
ISNULL(MAX(CASE WHEN D.FinishDate IS NULL THEN CAST(1 AS bit) END), 0) AS IsBusy,
ISNULL(MAX(CASE WHEN D.FinishDate IS NOT NULL AND LEN(D.ErrorMessage) > 0 THEN CAST(1 AS bit) END), 0) AS IsError
FROM Table_C C
JOIN Table_D D
ON D.ID = C.D_ID
AND C.B_ID = cte1.ID
) d
-- Other LEFT OUTER JOINS + WHERE statements are omitted for readability.
Or using a grouped LEFT JOIN
LEFT JOIN (
SELECT
C.B_ID,
ISNULL(MAX(CASE WHEN D.FinishDate IS NULL THEN CAST(1 AS bit) END), 0) AS IsBusy,
ISNULL(MAX(CASE WHEN D.FinishDate IS NOT NULL AND LEN(D.ErrorMessage) > 0 THEN CAST(1 AS bit) END), 0) AS IsError
FROM Table_C C
JOIN Table_D D
ON D.ID = C.D_ID
GROUP BY
C.B_ID
) d
ON d.B_ID = cte1.ID
I would expect something like the following indexes
Table_A (ID) INCLUDE (otherColumnsHere)
Table_B (A_Id, ID DESC) INCLUDE (NAME)
Table_C (B_ID, D_ID)
Table_D (ID) INCLUDE (FinishDate, ErrorMessage)
Obviously these can be clustered indexes, especially if they are primary keys. In which case you don't need INCLUDE
.