I have to use SQL for my internship and while I know the gist of it, I do not really have a background in programming nor do I know what makes codes efficient etc.
Query #1
SELECT DISTINCT
c.[STAT], c.[EVENT], f.[STAT], f.[EVENT]
FROM
(SELECT *
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS a
FROM
TABLE) AS b
) AS c
LEFT JOIN
(SELECT
*
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS d
FROM
TABLE) AS e
) AS f ON c.[ID] = f.[ID] AND a = d - 1
ORDER BY
c.[STAT], c.[EVENT], f.[STAT], f.[EVENT]
Query #2
SELECT DISTINCT
b.[STAT], b.[EVENT], d.[STAT], d.[EVENT]
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS a
FROM TABLE) AS b
LEFT JOIN
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS c
FROM TABLE) AS d ON b.[ID] = f.[ID] AND a = c - 1
ORDER BY
b.[STAT], b.[EVENT], d.[STAT], d.[EVENT]
Queries #1 and #2 return the same result, which is expected, but query #1 has a runtime of roughly 5 seconds while query #2 has a runtime of roughly 1 minute and 35 seconds. In other words, the second query takes a good 1.5 minutes longer to run than the first and I am really curious to know why.
The correct way to write this query uses lead()
. I'm pretty sure the select distinct
is not needed, so this does what you want:
SELECT stat, event,
LEAD(stat) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) as next_stat,
LEAD(event) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) as next_event
FROM TABLE t
ORDER BY stat, event;
The two queries you have written should be the same in SQL Server. Apparently, the extra subqueries are confusing the optimizer. You would need to learn about execution plans to understand this better.