We have the following table
userid transaction_dates
A {'LEGO': '2022-05-03', 'XBOX': '2023-01-02'}
A {'LEGO': '2022-06-01', 'PS': '2023-04-01', 'XBOX': '2022-01-01'}
B {'SWITCH': '2021-05-01', 'APPLE': '2023-05-01'}
and so on.
What I want to do is that for per each userid, for each transaction item, get the most recent transaction date.
In the above, it should be
userid transaction_dates
A {'LEGO': '2022-06-01', 'XBOX': '2023-01-02', 'PS': '2023-04-01'}
B {'SWITCH': '2021-05-01', 'APPLE': '2023-05-01'}
Is there a way to do it in Presto SQL? If it's just a sum, I can use map_union_sum, but I couldn't find a presto sql function that does it.
The only thing I could think about is to explode the map column, and compute the max value grouping by the userid, but wasn't sure if there are more efficient ways.
Any help would be greatly appreciated.
Thanks.
Assuming that data is a map (otherwise you will need to do some json parsing and casting to map) then you can use unnest
to flatten the data and then perform several groupings:
-- sample data
WITH dataset (userid, transaction_dates) AS (
values ('A', map(array['LEGO', 'XBOX'], array['2022-05-03','2023-01-02'])),
('A', map(array['LEGO', 'XBOX','PS'], array['2022-06-01', '2023-04-01','2022-01-01'])),
('B', map(array['SWITCH', 'APPLE'], array['2021-05-01', '2023-05-01']))
)
-- query
select userid, map_agg(k, v) transaction_dates
from (select userid, k, max(v) v
from dataset,
unnest(transaction_dates) as t (k, v)
group by userid, k)
group by userid;
Output:
userid | transaction_dates |
---|---|
B | {SWITCH=2021-05-01, APPLE=2023-05-01} |
A | {LEGO=2022-06-01, XBOX=2023-04-01, PS=2022-01-01} |