Search code examples
sqlpivotwindow-functionsprestoamazon-athena

athena presto - multiple columns from long to wide


I am new to Athena and I am trying to understand how to turn multiple columns from long to wide format. It seems like presto is what is needed, but I've only successfully been able to apply map_agg to one variable. I think my below final outcome can be achieved with multimap_agg but cannot quite get it to work.

Below I walk through my steps and data. If you have some suggestions or questions, please let me know!

First, the data starts like this:

id  | letter    | number   | value
------------------------------------
123 | a         | 1        | 62
123 | a         | 2        | 38
123 | a         | 3        | 44
123 | b         | 1        | 74
123 | b         | 2        | 91
123 | b         | 3        | 97
123 | c         | 1        | 38
123 | c         | 2        | 98
123 | c         | 3        | 22
456 | a         | 1        | 99
456 | a         | 2        | 33
456 | a         | 3        | 81
456 | b         | 1        | 34
456 | b         | 2        | 79
456 | b         | 3        | 43
456 | c         | 1        | 86
456 | c         | 2        | 60
456 | c         | 3        | 59

Then I transform the data into the below using filtering with the where clause and then joining:

id  | letter  | 1  | 2  | 3
----------------------------
123 | a       | 62 | 38 | 44
123 | b       | 74 | 91 | 97
123 | c       | 38 | 98 | 22
456 | a       | 99 | 33 | 81
456 | b       | 34 | 79 | 43
456 | c       | 86 | 60 | 59

For the final outcome, I would like to transform it into the below:

id  | a_1   | a_2   | a_3   | b_1   | b_2   | b_3   | c_1   | c_2   | c_3
--------------------------------------------------------------------------
123 | 62    | 38    | 44    | 74    | 91    | 97    | 38    | 98    | 22
456 | 99    | 33    | 81    | 34    | 79    | 43    | 86    | 60    | 59

Solution

  • You can use window functions and conditional aggregation. This requires that you know in advance the possible letters, and the maximum rows per id/letter tuple:

    select
        id,
        max(case when letter = 'a' and rn = 1 then value end) a_1,
        max(case when letter = 'a' and rn = 2 then value end) a_2,
        max(case when letter = 'a' and rn = 3 then value end) a_3,
        max(case when letter = 'b' and rn = 1 then value end) b_1,
        max(case when letter = 'b' and rn = 2 then value end) b_2,
        max(case when letter = 'b' and rn = 3 then value end) b_3,
        max(case when letter = 'c' and rn = 1 then value end) c_1,
        max(case when letter = 'c' and rn = 2 then value end) c_2,
        max(case when letter = 'c' and rn = 3 then value end) c_3
    from (
        select 
            t.*, 
            row_number() over(partition by id, letter order by number) rn
        from mytable t
    ) t
    group by id
    

    Actually, if the numbers are always 1, 2, 3, then you don't even need the window function:

    select
        id,
        max(case when letter = 'a' and number = 1 then value end) a_1,
        max(case when letter = 'a' and number = 2 then value end) a_2,
        max(case when letter = 'a' and number = 3 then value end) a_3,
        max(case when letter = 'b' and number = 1 then value end) b_1,
        max(case when letter = 'b' and number = 2 then value end) b_2,
        max(case when letter = 'b' and number = 3 then value end) b_3,
        max(case when letter = 'c' and number = 1 then value end) c_1,
        max(case when letter = 'c' and number = 2 then value end) c_2,
        max(case when letter = 'c' and number = 3 then value end) c_3
    from mytable t
    group by id