sql sql-server t-sql datetime greatest-n-per-group

Correlated subquery - Group by in inner query and rewrite to window function

I am looking at this query:

select ID, date1 
from  table1 as t1
where date2 = (select max(date2) 
                from table1 as t2
                where t1.ID = t2.ID
                group by t2.ID)

First of all I don't think the Group by is necessary. Am I right? Second is it generally more efficint to rewrite this as a window function?

Does this look right?

select ID, date1
from (select ID, date1, row_number() over(partition by ID) as row_num,
                  max(date2) over(partition by ID) as max_date 
      from  table1)
where row_num = 1;

Solution

First of all I don't think the Group by is necessary. Am I right?

You are correct. That's a scalar subquery anyway: group by doesn't change the result since we are filtering on a single ID. Not using group by makes the intent clearer in my opinion.

The window function solution does not need max() - the filtering is sufficient, but the window function needs an order by clause (and the derived table needs an alias). So:

select ID, date1
from (
    select ID, date1, row_number() over(partition by ID order by date1 desc) as row_num
    from table1
) t1
where row_num = 1;

That's not exactly the same thing as the first query, because it does not allow ties, while the first query does. Use rank() if you want the exact same logic.

Which query performs better highly depends on your data and structure. The correlated subquery would take advantage of an index on (id, date2). Both solutions are canonical approaches to the problem, and you would probably need to test both solutions against your data to see which one is better in your context.