I've been trying to improve a SQL query which uses multiple sub queries over the same table but with different conditions and only retrieves the first result from each sub queries.
I will try to simplify the use-case :
I have a table Products
like this:
Product_id | reference | field3 | field 4 |
---|---|---|---|
1 | ref1 | val1 | val3 |
2 | ref2 | val2 | val4 |
And another table History
:
History_id | reference | utilcode | physicalcode | issue | media | datetime |
---|---|---|---|---|---|---|
1 | ref1 | 'test' | 'TST' | '0' | '&audio' | 'a_date' |
2 | ref2 | 'phone' | 'CALLER' | '1' | '&video' | 'a_date' |
3 | ref2 | 'test' | 'CALLER' | '2' | '&test' | 'a_date' |
History
is a log table and therefore contains a lot of values.
Now I have a query like this
SELECT
p.reference,
p.field3, p.field4,
(SELECT TOP 1 a_date
FROM history h
WHERE h.reference = p.reference
AND physicalcode = 'TST'
AND issue = 0
ORDER BY a_date DESC) AS latest_date_issue_0,
(SELECT TOP 1 a_date
FROM history h
WHERE h.reference = p.reference
AND physicalcode = 'TST'
AND issue = 1
ORDER BY a_date DESC) AS latest_date_issue_1
(SELECT TOP 1 a_date
FROM history h
WHERE h.reference = p.reference
AND utilcode = 'phone'
ORDER BY a_date DESC) AS latest_date_phone,
(SELECT TOP 1 media
FROM history h
WHERE h.reference = p.reference
AND utilcode = 'phone'
ORDER BY a_date DESC) AS latest_media,
-- and so on with many possible combinations
-- Note that there are more than this few fields on the tables I work on.
WHERE
p.field3 = 'valX',
p.field4 = 'valY'
FROM
products p
How could I merge every sub selects ? Or even a few that are alike to improve the performance ?
History being a very big table, selecting over it multiple times drastically slows down the query.
The main problem being that I only need the first value every time.
Thank you for your time and I hope to find a better way to deal with this issue!
I tried to use ROW_NUMBER()
but I could not find a suitable way to use it.
I also tried to create a tmp table using WITH
to group every possibility from history but it was worse.
EDIT : Execution plan https://www.brentozar.com/pastetheplan/?id=Sy1AKIsUs
Thanks to @O.Jones I've been able to find a way to improve this query.
What I did to merge a few requests was to use a CTE like so :
From
SELECT
(SELECT TOP 1 a_date
FROM history h
WHERE h.reference = p.reference
AND physicalcode = 'TST'
AND issue = 0
ORDER BY a_date DESC) AS latest_date_issue_0,
(SELECT TOP 1 a_date
FROM history h
WHERE h.reference = p.reference
AND physicalcode = 'TST'
AND issue = 1
ORDER BY a_date DESC) AS latest_date_issue_1
(SELECT top 1 a_date
FROM history h
WHERE h.reference = p.reference
AND h.physicalcode = 'TSTKO'
ORDER BY h.d_systeme DESC ) AS d_tst_ko,
(SELECT top 1 a_date
FROM history h
WHERE h.reference = p.reference
AND h.physicalcode = 'CALLERID'
ORDER BY h.d_systeme DESC ) AS d_wrong_number
FROM products p
To
WITH physicalcode_cte (reference, physicalcode, issue, a_date) as
(
SELECT reference, physicalcode, issue, max(a_date)
from historique
where codephysique in ('TST','TSTKO','CALLERID')
and a_date > dateadd(month, -4, getdate()) -- filter on date range to reduce number of rows
group by reference, physicalcode, issue
)
SELECT
date_issue_0.a_date,
date_issue_1.a_date,
tst_ko.a_date,
wrong_number.a_date
FROM products p
LEFT JOIN physicalcode_cte date_issue_0 on p.reference = date_issue_0.reference
AND date_issue_0.codephysique = 'TST'
AND date_issue_0.anomalie = 0
LEFT JOIN physicalcode_cte date_issue_1 on p.reference = date_issue_1.reference
AND date_issue_1.codephysique = 'TST'
AND date_issue_1.anomalie = 1
LEFT JOIN physicalcode_cte tst_ko on p.reference = tst_ko.reference
AND tst_ko.codephysique = 'TST'
LEFT JOIN physicalcode_cte wrong_number on p.reference = wrong_number.reference AND
AND wrong_number.codephysique = 'TST'
I've applied this idea for different scenarii and made 2 CTE.
I couldn't merge everything, sometime merging caused cost increase. But after several tests I've been able to go from 7100 total cost to 2100.
It is still a lot but 3 times less anyway. Takes 5 seconds instead of a timeout.
It's a query used for monthly reports so I don't need it to be super fast, I will keep it that way.
Thanks you!