Search code examples
sqlsqlitewindow-functionspartitioning

SQLITE: get last component update for each distinct component


I have a simple SQLITE db that looks a bit like this

ID      TID      LASTUPDATE     UPDATE
============================================
1       213      2020-09-09         ok
2       416      2019-12-25         ok
3       213      2020-11-10         meh
...
999999  899      2020-12-11         bad

There are around a thousand DISTINCT TIDs and hundreds of thousands of updates for each TID.

I would like to get the last UPDATE for each distinct TID irrespective of when it was made and I'd like to do it in one go. I don't want to get all the distinct TIDs then for each TID get its UPDATE and LASTUPDATE

Something like "SELECT * FROM updates GROUP BY TID" won't work because the GROUP BY will give the first TID it finds and I want the last one it did, so I don't want this:

1       213      2020-09-09         ok
2       416      2019-12-25         ok
...

When I want (in effect, ordering doesn't matter)

3       213      2020-11-10         meh
2       416      2019-12-25         ok
...

Unfortunately my knowledge of SQL ends with simple CRUD stuff.

Any help appreciated.

(NEXT DAY)

As an update to this - for me anyway - the basic way of getting all the distinct tids then foreach tid get its last update was faster (.5s) when the number rows in the table was below 200,000 once it went above that the runtime rocketed into many secs and the 3rd way became much more useful as although it took 2.5s to run it appears that that was a constant runtime that didn't appear to change greatly.

Also I couldn't get the 2nd version to work, but with some editing

select t.*
from updates t join
     (select id, tid, max(t.lastupdate) as max_lastupdate from updates t group by vid) tt
      on t.id = tt.id and (t.lastupdate = tt.max_lastupdate);

I'm finding that its runtime is always around 250ms which easily out performs both versions.

So thanks Gordon Linoff, I appreciate your time on this.


Solution

  • One method uses a correlated subquery:

    select t.*
    from t
    where t.lastupdate = (select max(t2.lastupdate) from t t2 where t2.tid = t.tid);
    

    For performance, you want an index on (tid, lastupdate).

    You can also try phrasing this as:

    select t.*
    from t join
         (select tid, max(t2.lastupdate) as max_lastupdate
          from t
          group by tid
         ) tt
         on t.id = tt.id and t.lastupdate = tt.max_lastupdate;
    

    Or:

    select t.*
    from (select t.*, 
                 row_number() over (partition by tid order by lastupdate desc) as seqnum
          from t
         ) t
    where seqnum = 1;