I am looking at this query:
select ID, date1
from table1 as t1
where date2 = (select max(date2)
from table1 as t2
where t1.ID = t2.ID
group by t2.ID)
First of all I don't think the Group by
is necessary. Am I right? Second is it generally more efficint to rewrite this as a window function?
Does this look right?
select ID, date1
from (select ID, date1, row_number() over(partition by ID) as row_num,
max(date2) over(partition by ID) as max_date
from table1)
where row_num = 1;
First of all I don't think the Group by is necessary. Am I right?
You are correct. That's a scalar subquery anyway: group by
doesn't change the result since we are filtering on a single ID
. Not using group by
makes the intent clearer in my opinion.
The window function solution does not need max()
- the filtering is sufficient, but the window function needs an order by
clause (and the derived table needs an alias). So:
select ID, date1
from (
select ID, date1, row_number() over(partition by ID order by date1 desc) as row_num
from table1
) t1
where row_num = 1;
That's not exactly the same thing as the first query, because it does not allow ties, while the first query does. Use rank()
if you want the exact same logic.
Which query performs better highly depends on your data and structure. The correlated subquery would take advantage of an index on (id, date2)
. Both solutions are canonical approaches to the problem, and you would probably need to test both solutions against your data to see which one is better in your context.