Search code examples
sqlpython-3.xsqlitesql-order-bygreatest-n-per-group

Python sqlite3 SQL query Get all entries with newest date but limit per single unique column


I have a table called 'fileEvents'. It has four columns (there are more but not relevant to the question): id, fileId, action and time.

The same fileId, action and time values can appear in multiple rows.

The query I want is simple but I can't think of a working one: Get the latest entry since a specific time for every fileId.

I tried the following.

First I will try to just get all entries sorted by time since a specific time:

SELECT * FROM `fileEvents` ORDER BY `time` DESC WHERE `time` < 1000

The result is of course fine (id, action, fileId, time):

[(6, 0, 3, 810), (5, 0, 3, 410), (2, 0, 1, 210), (3, 0, 2, 210), (4, 0, 3, 210), (1, 0, 1, 200)]

So it is all sorted. But now I only want unique fileIds. So I add a GROUP BYfileId`:

SELECT * FROM `fileEvents` GROUP BY `fileId` ORDER BY `time` DESC WHERE `time` < 1000

Which of course is wrong. Because first it will group the results and then sort them, but they are already grouped so there is no sorting:

[(3, 0, 2, 210), (4, 0, 3, 210), (1, 0, 1, 200)]

When I try to reverse the GROUP BY and ORDER BY, I get a OperationalError: near "GROUP": syntax error

Also when I try to do a sub query where I first get the sorted list and then group them the result is wrong:

SELECT * FROM `fileEvents` WHERE `id` IN (
SELECT `id` FROM `fileEvents` ORDER BY `time` DESC WHERE `time` < 1000
) GROUP BY `fileId`

With the (wrong) result:

[(1, 0, 1, 200), (3, 0, 2, 210), (4, 0, 3, 210)]

The result I am looking for is:

[(6, 0, 3, 810), (2, 0, 1, 210), (3, 0, 2, 210)]

Does anyone have an idea how I could get the result I want? What am I missing? Thanks a lot!


Solution

  • With ROW_NUMBER() window function:

    select * -- replace * with the columns that you want in the result
    from (
      select *, row_number() over (partition by fileid order by time desc) rn
      from fileevents 
      where time < 1000
    ) t
    where rn = 1