Search code examples
sqlpresto

Add array for column value for duplicates [SQL]


I would like to add for duplicate values an array that contains all last_action values. sample_table

client_id  action_timestamp  last_action
201        2020-01-01 09:00  click
201        2020-01-01 09:00  browse
303        2020-01-01 11:00  action


desired_output_table

client_id  action_timestamp  agg_action
201        2020-01-01 09:00  {click,browse}
303        2020-01-01 11:00  {action}

Solution

  • You can use group by and array_agg:

    -- sample data
    WITH dataset (client_id,  action_timestamp,  last_action) AS (
        VALUES  ('201',        '2020-01-01 09:00',  'click'),
                ('201',        '2020-01-01 09:00',  'browse'),
                ('303',        '2020-01-01 11:00',  'action')
    ) 
    
    --query
    select client_id,
        action_timestamp,
        array_agg(last_action)
    from dataset
    group by client_id, action_timestamp
    

    Output:

    client_id action_timestamp _col2
    201 2020-01-01 09:00 [click, browse]
    303 2020-01-01 11:00 [action]