I have a table called 'fileEvents'. It has four columns (there are more but not relevant to the question): id, fileId, action and time.
The same fileId, action and time values can appear in multiple rows.
The query I want is simple but I can't think of a working one: Get the latest entry since a specific time for every fileId.
I tried the following.
First I will try to just get all entries sorted by time since a specific time:
SELECT * FROM `fileEvents` ORDER BY `time` DESC WHERE `time` < 1000
The result is of course fine (id
, action
, fileId
, time
):
[(6, 0, 3, 810), (5, 0, 3, 410), (2, 0, 1, 210), (3, 0, 2, 210), (4, 0, 3, 210), (1, 0, 1, 200)]
So it is all sorted. But now I only want unique fileIds. So I add a GROUP BY
fileId`:
SELECT * FROM `fileEvents` GROUP BY `fileId` ORDER BY `time` DESC WHERE `time` < 1000
Which of course is wrong. Because first it will group the results and then sort them, but they are already grouped so there is no sorting:
[(3, 0, 2, 210), (4, 0, 3, 210), (1, 0, 1, 200)]
When I try to reverse the GROUP BY and ORDER BY, I get a OperationalError: near "GROUP": syntax error
Also when I try to do a sub query where I first get the sorted list and then group them the result is wrong:
SELECT * FROM `fileEvents` WHERE `id` IN (
SELECT `id` FROM `fileEvents` ORDER BY `time` DESC WHERE `time` < 1000
) GROUP BY `fileId`
With the (wrong) result:
[(1, 0, 1, 200), (3, 0, 2, 210), (4, 0, 3, 210)]
The result I am looking for is:
[(6, 0, 3, 810), (2, 0, 1, 210), (3, 0, 2, 210)]
Does anyone have an idea how I could get the result I want? What am I missing? Thanks a lot!
With ROW_NUMBER()
window function:
select * -- replace * with the columns that you want in the result
from (
select *, row_number() over (partition by fileid order by time desc) rn
from fileevents
where time < 1000
) t
where rn = 1