I am trying to see if there is a way to optimize bottom T-SQL code.
This is how previous co-worker wrote, and I am not sure if there is a better way to write because it has Select
within the Where
clause (subquery?).
select
Employee_Number,
DateLoad
from dbo.table1 as a
where DateLoad = (
select min(DateLoad)
from dbo.table2 as b
where a.Employee_Number = b.Employee_Number
)
This still uses a nested SELECT, but not in the WHERE clause and should run faster:
SELECT Employee_Number, DateLoad
FROM (
select a.Employee_number, a.DateLoad
, row_number() over (partition by a.Employee_Number
order by b.DateLoad) rn
from Table1 A
inner join Table2 B on B.Employee_Number = A.Employee_Number
and a.DateLoad = b.DateLoad
) t
WHERE rn = 1
Depending on the data there might also be a way to do this using GROUP BY, without any nesting:
SELECT A.Employee_Number, Min(B.DateLoad) DateLoad
FROM Table1 A
INNER JOIN Table2 B ON B.Employee_Number = A.Employee_Number
AND a.DateLoad = b.DateLoad
GROUP BY A.Employee_Number
But I'm not 100% certain this will be functionally equivalent to the original. You'll want to check carefully to be sure it works reliably on your data.
The main thing to understand is a JOIN is almost always faster than a correlated nested SELECT ("correlated" mean "per row" in this context), and not by just a little. The only exception I'm aware of is using an EXISTS() predicate instead of an exclusion join.