Assume I have a table called "Diary" like this:
| id | user_id | recorded_at | record |
|----|---------|--------------------------|--------|
| 20 | 50245 |2017-10-01 23:00:14.765366| 89 |
| 21 | 50245 |2017-12-05 10:00:33.135331| 97 |
| 22 | 50245 |2017-12-31 11:50:23.965134| 80 |
| 23 | 76766 |2015-10-06 11:00:14.902452| 70 |
| 24 | 76766 |2015-10-07 22:40:59.124553| 81 |
For each user I want to retrieve the latest row and all rows within one month prior to that.
In other words, for user_id 50245, I want the his/her data from "2017-12-01 11:50:23.965134" to "2017-12-31 11:50:23.965134"; for user_id 76766, I want his/her data from "2015-09-07 22:40:59.124553" to "2015-10-07 22:40:59.124553".
Hence the desired result looks like this:
| id | user_id | recorded_at | record |
|----|---------|--------------------------|--------|
| 21 | 50245 |2017-12-05 10:00:33.135331| 97 |
| 22 | 50245 |2017-12-31 11:50:23.965134| 80 |
| 23 | 76766 |2015-10-06 11:00:14.902452| 70 |
| 24 | 76766 |2015-10-07 22:40:59.124553| 81 |
Please note that the record of id 20 is not included because it is more than one month prior to user_id 50245's last record.
Is there any way I can write an SQL query to achieve this?
For small tables, any (valid) query technique is good.
For big tables, details matter. Assuming:
There is also a users
table with user_id
as PK containing all relevant users (or possibly a few more). This is the typical setup.
You have (or can create) an index on diary (user_id, recorded_at DESC NULLS LAST)
. NULLS LAST
is optional if recorded_at
is defined NOT NULL
. But make sure the query matches the index.
More than a few rows per user - the typical use case.
This should be among the fastest options:
SELECT d.*
FROM users u
CROSS JOIN LATERAL (
SELECT recorded_at
FROM diary
WHERE user_id = u.user_id
ORDER BY recorded_at DESC NULLS LAST
LIMIT 1
) d1
JOIN diary d ON d.user_id = u.user_id
AND d.recorded_at >= d1.recorded_at - interval '1 month'
ORDER BY d.user_id, d.recorded_at;
Produces your desired result exactly.
For only few rows per user, max()
or DISTINCT ON ()
in a subquery are typically faster.
Related (with detailed explanation):
About the FROM
clause: