I'm doing my first foray into something proper in SQL, except having trouble replicating something which I've been easily able to do via Alteryx.
Essentially, some basic data wrangling needed to create a summary of a Table which I've already created in Impala/Hive. The basic table needs to be broken into smaller tables (unpivoted and pivoted), which are then unioned together to create an aggregated table.
Table is as following:
Run_Code | ID | ColB | ColC | ColD | ColE | ColF | ColG | TaxExpense | RetainedExpense | IncomeExpense | Year
-----------------------------------------------------------------------------------------------------------------
run1 | 21 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234.56789 | 1234.56789 | 1234.56789 | Year1
run1 | 22 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234.56789 | 1234.56789 | 1234.56789 | Year2
run1 | 23 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234.56789 | 1234.56789 | 1234.56789 | Year3
run1 | 24 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234.56789 | 1234.56789 | 1234.56789 | Year4
Currently the following is done in Alteryx; whereby only TaxExpense is selected and then un-pivoted for Year, and then Pivoted back to as a column.
Run_Code | ID | ColB | ColC | ColD | Name | Year1 | Year2 | Year3 | Year4
-----------------------------------------------------------------------------------------------------------------
run1 | 21 | 1234 | 1234 | 1234 | Tax Expense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | Tax Expense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | Tax Expense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | Tax Expense | (sum) | (sum) | (sum) | (sum)
The same is done for Retained Expense and then to IncomeExpense as well.
Run_Code | ID | ColB | ColC | ColD | Name | Year1 | Year2 | Year3 | Year4
-----------------------------------------------------------------------------------------------------------------
run1 | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
End Result is the below and the desired is the below:
Run_Code | ID | ColB | ColC | ColD | Name | Year1 | Year2 | Year3 | Year4
-----------------------------------------------------------------------------------------------------------------
run1 | 21 | 1234 | 1234 | 1234 | TaxExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | TaxExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | TaxExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | TaxExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | IncomeExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | IncomeExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | IncomeExpense | (sum) | (sum) | (sum) | (sum)
run1 | 21 | 1234 | 1234 | 1234 | IncomeExpense | (sum) | (sum) | (sum) | (sum)
Appreciate any help in creating an SQL which solves the above.
Hmmm . . . If I understand correctly, ou can unpivot and reaggregate:
select Run_Code, ID, ColB, ColC, ColD,
sum(case when year = 'year1' then expense end) as year_1,
sum(case when year = 'year2' then expense end) as year_2,
sum(case when year = 'year3' then expense end) as year_3,
sum(case when year = 'year4' then expense end) as year_4
from ((select Run_Code, ID, ColB, ColC, ColD, 'TaxExpense' as name, TaxExpense as expense, year
from t
) union all
(select Run_Code, ID, ColB, ColC, ColD, 'RetainedExpense' as name, RetainedExpense, year
from t
) union all
(select Run_Code, ID, ColB, ColC, ColD, 'IncomeExpense' as name, IncomeExpense, year
from t
)
) t
group by Run_Code, ID, ColB, ColC, ColD, name