I have a simple SQLITE db that looks a bit like this
ID TID LASTUPDATE UPDATE
============================================
1 213 2020-09-09 ok
2 416 2019-12-25 ok
3 213 2020-11-10 meh
...
999999 899 2020-12-11 bad
There are around a thousand DISTINCT TIDs and hundreds of thousands of updates for each TID.
I would like to get the last UPDATE for each distinct TID irrespective of when it was made and I'd like to do it in one go. I don't want to get all the distinct TIDs then for each TID get its UPDATE and LASTUPDATE
Something like "SELECT * FROM updates GROUP BY TID" won't work because the GROUP BY will give the first TID it finds and I want the last one it did, so I don't want this:
1 213 2020-09-09 ok
2 416 2019-12-25 ok
...
When I want (in effect, ordering doesn't matter)
3 213 2020-11-10 meh
2 416 2019-12-25 ok
...
Unfortunately my knowledge of SQL ends with simple CRUD stuff.
Any help appreciated.
(NEXT DAY)
As an update to this - for me anyway - the basic way of getting all the distinct tids then foreach tid get its last update was faster (.5s) when the number rows in the table was below 200,000 once it went above that the runtime rocketed into many secs and the 3rd way became much more useful as although it took 2.5s to run it appears that that was a constant runtime that didn't appear to change greatly.
Also I couldn't get the 2nd version to work, but with some editing
select t.*
from updates t join
(select id, tid, max(t.lastupdate) as max_lastupdate from updates t group by vid) tt
on t.id = tt.id and (t.lastupdate = tt.max_lastupdate);
I'm finding that its runtime is always around 250ms which easily out performs both versions.
So thanks Gordon Linoff, I appreciate your time on this.
One method uses a correlated subquery:
select t.*
from t
where t.lastupdate = (select max(t2.lastupdate) from t t2 where t2.tid = t.tid);
For performance, you want an index on (tid, lastupdate)
.
You can also try phrasing this as:
select t.*
from t join
(select tid, max(t2.lastupdate) as max_lastupdate
from t
group by tid
) tt
on t.id = tt.id and t.lastupdate = tt.max_lastupdate;
Or:
select t.*
from (select t.*,
row_number() over (partition by tid order by lastupdate desc) as seqnum
from t
) t
where seqnum = 1;