Search code examples
sqlsql-servert-sqlquery-optimization

Optimize subquery (without using Select inside Where clause)?


I am trying to see if there is a way to optimize bottom T-SQL code.

This is how previous co-worker wrote, and I am not sure if there is a better way to write because it has Select within the Where clause (subquery?).

select 
    Employee_Number,    
    DateLoad 
from dbo.table1 as a
where DateLoad = (
    select min(DateLoad)
    from dbo.table2 as b 
    where a.Employee_Number = b.Employee_Number
)

Solution

  • This still uses a nested SELECT, but not in the WHERE clause and should run faster:

    SELECT Employee_Number, DateLoad
    FROM (
        select a.Employee_number, a.DateLoad
           , row_number() over (partition by a.Employee_Number
                                order by b.DateLoad) rn
        from Table1 A
        inner join Table2 B on B.Employee_Number = A.Employee_Number 
            and a.DateLoad = b.DateLoad
    ) t
    WHERE rn = 1
    

    Depending on the data there might also be a way to do this using GROUP BY, without any nesting:

    SELECT A.Employee_Number, Min(B.DateLoad) DateLoad
    FROM Table1 A
    INNER JOIN Table2 B ON B.Employee_Number = A.Employee_Number 
        AND a.DateLoad = b.DateLoad
    GROUP BY A.Employee_Number
    

    But I'm not 100% certain this will be functionally equivalent to the original. You'll want to check carefully to be sure it works reliably on your data.

    The main thing to understand is a JOIN is almost always faster than a correlated nested SELECT ("correlated" mean "per row" in this context), and not by just a little. The only exception I'm aware of is using an EXISTS() predicate instead of an exclusion join.