Search code examples
sqlpivotunionimpalaunpivot

Need to create a view in Impala which Un-Pivots, Pivots and Union's it altogether


I'm doing my first foray into something proper in SQL, except having trouble replicating something which I've been easily able to do via Alteryx.

Essentially, some basic data wrangling needed to create a summary of a Table which I've already created in Impala/Hive. The basic table needs to be broken into smaller tables (unpivoted and pivoted), which are then unioned together to create an aggregated table.

Table is as following:

Run_Code | ID | ColB | ColC | ColD | ColE | ColF | ColG | TaxExpense | RetainedExpense | IncomeExpense | Year
-----------------------------------------------------------------------------------------------------------------
run1     | 21 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234.56789 | 1234.56789      |  1234.56789   | Year1 
run1     | 22 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234.56789 | 1234.56789      |  1234.56789   | Year2
run1     | 23 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234.56789 | 1234.56789      |  1234.56789   | Year3
run1     | 24 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234.56789 | 1234.56789      |  1234.56789   | Year4

Currently the following is done in Alteryx; whereby only TaxExpense is selected and then un-pivoted for Year, and then Pivoted back to as a column.

Run_Code | ID | ColB | ColC | ColD |     Name    | Year1 | Year2 | Year3 | Year4
-----------------------------------------------------------------------------------------------------------------
run1     | 21 | 1234 | 1234 | 1234 | Tax Expense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | Tax Expense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | Tax Expense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | Tax Expense | (sum) | (sum) | (sum) | (sum)

The same is done for Retained Expense and then to IncomeExpense as well.

Run_Code | ID | ColB | ColC | ColD |      Name      | Year1 | Year2 | Year3 | Year4
-----------------------------------------------------------------------------------------------------------------
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)

End Result is the below and the desired is the below:

Run_Code | ID | ColB | ColC | ColD |      Name       | Year1 | Year2 | Year3 | Year4
-----------------------------------------------------------------------------------------------------------------
run1     | 21 | 1234 | 1234 | 1234 | TaxExpense      | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | TaxExpense      | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | TaxExpense      | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | TaxExpense      | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | IncomeExpense   | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | IncomeExpense   | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | IncomeExpense   | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | IncomeExpense   | (sum) | (sum) | (sum) | (sum)

Appreciate any help in creating an SQL which solves the above.


Solution

  • Hmmm . . . If I understand correctly, ou can unpivot and reaggregate:

    select Run_Code, ID, ColB, ColC, ColD,
           sum(case when year = 'year1' then expense end) as year_1,
           sum(case when year = 'year2' then expense end) as year_2,
           sum(case when year = 'year3' then expense end) as year_3,
           sum(case when year = 'year4' then expense end) as year_4
    from ((select Run_Code, ID, ColB, ColC, ColD, 'TaxExpense' as name, TaxExpense as expense, year
           from t
          ) union all
          (select Run_Code, ID, ColB, ColC, ColD, 'RetainedExpense' as name, RetainedExpense, year
           from t
          ) union all
          (select Run_Code, ID, ColB, ColC, ColD, 'IncomeExpense' as name, IncomeExpense, year
           from t
          )
         ) t
    group by Run_Code, ID, ColB, ColC, ColD, name