Search code examples
sqlgroup-byprestotrino

Getting the max date per each Key per user


We have the following table

   userid        transaction_dates
    A            {'LEGO': '2022-05-03', 'XBOX': '2023-01-02'}
    A            {'LEGO': '2022-06-01', 'PS': '2023-04-01', 'XBOX': '2022-01-01'}
    B            {'SWITCH': '2021-05-01', 'APPLE': '2023-05-01'}

and so on.

What I want to do is that for per each userid, for each transaction item, get the most recent transaction date.

In the above, it should be

   userid        transaction_dates
    A            {'LEGO': '2022-06-01', 'XBOX': '2023-01-02', 'PS': '2023-04-01'}
    B            {'SWITCH': '2021-05-01', 'APPLE': '2023-05-01'}

Is there a way to do it in Presto SQL? If it's just a sum, I can use map_union_sum, but I couldn't find a presto sql function that does it.

The only thing I could think about is to explode the map column, and compute the max value grouping by the userid, but wasn't sure if there are more efficient ways.

Any help would be greatly appreciated.

Thanks.


Solution

  • Assuming that data is a map (otherwise you will need to do some json parsing and casting to map) then you can use unnest to flatten the data and then perform several groupings:

    -- sample data
    WITH dataset (userid, transaction_dates) AS (
        values ('A', map(array['LEGO', 'XBOX'], array['2022-05-03','2023-01-02'])),
        ('A', map(array['LEGO', 'XBOX','PS'], array['2022-06-01', '2023-04-01','2022-01-01'])),
        ('B', map(array['SWITCH', 'APPLE'], array['2021-05-01', '2023-05-01']))
    )
    
    -- query
    select userid, map_agg(k, v) transaction_dates
    from (select userid, k, max(v) v
          from dataset,
               unnest(transaction_dates) as t (k, v)
          group by userid, k)
    group by userid;
    

    Output:

    userid transaction_dates
    B {SWITCH=2021-05-01, APPLE=2023-05-01}
    A {LEGO=2022-06-01, XBOX=2023-04-01, PS=2022-01-01}