I know that this is a common problem, and I've seen here and here but neither of these solutions are working for me.
I have a table with fields session_time
, user
and email
(these fields are just dummy placeholders, my real schema is different so please don't look too much into their names)
supposing each user
can have various emails
related to it, I need to output, per row:
1 - a user
2 - an array consisting of every email
related to the user
However, some users have WAY too many emails, so I need to limit them to the most recent entries, lets say 2 most recent per user.
WITH
sample_data AS (
SELECT
'2023-06-12 10:00:00' AS session_time,
'1' AS user,
'example@example.com' AS email
UNION ALL
SELECT
'2023-06-12 11:00:00' AS session_time,
'2' AS user,
'example@example.com' AS email
UNION ALL
SELECT
'2023-06-12 12:00:00' AS session_time,
'3' AS user,
'example@example.com' AS email,
UNION ALL
SELECT
'2023-06-12 13:00:00' AS session_time,
'3' AS user,
'example2@example.com' AS email
UNION ALL
SELECT
'2023-06-12 14:00:00' AS session_time,
'3' AS user,
'example3@example.com' AS email
)
, filtered_user_email AS (
SELECT
user,
email,
ROW_NUMBER() OVER(PARTITION BY user, email ORDER BY MAX(session_time) DESC) row_num
FROM
sample_data
GROUP BY
uuid,
user
)
running SELECT user, array_agg(distinct email) as emails FROM filtered_user_email WHERE row_num <= 2 group by user
outputs every row, because every row gets a row_num of 1, so for user 3, I get 3 entries instead of just the two most recent ones.
How can I then just select the 2 most recent emails per user?
Presumably, you just need to change the partition
of your row_number
:
select usr, array_agg(email) emails
from (
select usr, email,
row_number() over(partition by usr order by max(session_time))rn
from sample_data s
group by usr, email
) t
where rn <= 2
group by usr