Search code examples
sqlamazon-athenaprestotrino

HIVE (Athena) - Separate array of PKs into column with corresponding value from FK table


I have a table with column containing array of PKs in another table.

I want to make a query that will take a certain column from the second table and for each of the PKs in the array set that value in a new column.

For example:

Table survey has a column answers with array value:

|  pk | answers
|-----|------------
|  1  |[1,2,3,4]
|-----|------------

Table answers has a column text, and the following rows:

|  pk |         text
|-----|--------------------------
|  1  |'This is an answer'
|  2  |'Another answer'
|  3  |'Yet another answer'
|  4  |'Last answer'
|-----|--------------------------

I want to make a query that will return the following:

|survey_pk|      answer1      |      answer2   |      answer3       |   answer4   |
|---------|-------------------|----------------|--------------------|-------------|
|    1    |'This is an answer'|'Another answer'|'Yet another answer'|'Last answer'|
|---------|-------------------|----------------|--------------------|-------------|

(I don't mind writing the new column names manually).

*** UPDATE *** @Guru Stron wrote a great answer. But now I am not able to add another columns of table survey to the output.

|  pk | answers  | date_created
+-----+----------+---------------
|  1  |[1,2,3,4] | 2023-02-17...
+-----+----------+---------------

And now the output will be:

|survey_pk|      answer1      |answer2|...| date_created 
|---------|-------------------|-------|---|--------------
|    1    |'This is an answer'|  ...  |...| 2023-02-17...
|---------|-------------------|-------|---|--------------

Is it possible to get the output that way?


Solution

  • Assuming number of answers is limited to 4 and the array contains answers ids from the second table you can do something like the following:

    -- sample data
    WITH survey(pk, answers) AS (
        values (1, array[1,2,3,4])
    ),
    answers(pk, text) as (
        values (1, 'This is an answer'),
        (2, 'Another answer'),
        (3, 'Yet another answer'),
        (4, 'Last answer')
    )
    
    -- query
    select pk,
        map[1] answer1,
        map[2] answer2,
        map[3] answer3,
        map[4] answer4
    from (
        select s.pk, map_agg(t.index, a.text) map
        from survey s
        cross join unnest(s.answers) with ordinality as t(a, index)
        join answers a on t.a = a.pk
        group by s.pk)
    ;
    

    Output

    pk answer1 answer2 answer3 answer4
    1 This is an answer Another answer Yet another answer Last answer

    There is no easy pivoting in Athena/Presto/Trino ATM so this is as far as you can go, potentially you can consider just aggregating back to map (as done in subquery before) or array.

    Array aggregation can look like this:

    -- query
    select s.pk, array_agg(a.text order by t.index) map
    from survey s
    cross join unnest(s.answers) with ordinality as t(a, index)
    join answers a on t.a = a.pk
    group by s.pk;
    

    With result:

    pk map
    1 [This is an answer, Another answer, Yet another answer, Last answer]