Search code examples
sqlamazon-web-servicesamazon-athenapresto

AWS athena (presto SQL): How to take the (set-like) union of arrays in a group by statement


I am trying to combine arrays of unique userids into one single array of unique userids. AWS athena does not have the set_union function, so I cannot use

, set_union(userids)

And reduce_agg seems to not allow for arrays

, reduce_agg(userids, ARRAY[], (a, b) -> array_union(a, b), (a, b) -> array_union(a, b))

Is there any other trick I can use to combine arrays into one array (distinct items)


Solution

  • One option is to use array_agg and process the resulting array of arrays via flatten and array_distinct:

    -- sample data
    WITH dataset (id, userids) AS (
        VALUES (1, array [ 1, 2, 3 ]),
            (1, array [ 3, 4, 5 ])
    ) 
    
    --query
    SELECT array_distinct(flatten(array_agg(userids)))
    FROM dataset
    GROUP BY id
    

    Output:

    _col0
    [1, 2, 3, 4, 5]