first of all, I have been searching for this for some time.
I have a table that looks kinda like this :
ID Expenditure MonthYear
1A 1,000 122019
1A 1,500 012020
1B 1,900 122019
1C 2,400 122019
1B 2,400 012020
1C 900 012020
1A 800 022020
Since the rows can reach to thousands, and some IDs are repeated tens of times, I want to combine those with distinct ID into a single row and add columns which retain all the information in it. I want to make the table to looks something like this :
ID Expenditure_1 MonthYear_1 Expenditure_2 MonthYear_2 Expenditure_3 MonthYear_3
1A 1,000 122019 1,500 012020 800 022020
1B 1,900 122019 2,400 012020 Null Null
1C 2,400 122019 900 012020 Null Null
What is the best way in approaching this problem using SQL on Impala? Thank you.
You can use conditional aggregation and row_number():
select id,
max(case when seqnum = 1 then expenditure end) as expenditure_1,
max(case when seqnum = 1 then monthyear end) as monthyear_1,
max(case when seqnum = 2 then expenditure end) as expenditure_2,
max(case when seqnum = 2 then monthyear end) as monthyear_2,
max(case when seqnum = 3 then expenditure end) as expenditure_3,
max(case when seqnum = 3 then monthyear end) as monthyear_3
from (select t.*,
row_number() over (partition by id order by right(monthyear, 4), left(monthyear, 2)) as seqnum
from t
) t
group by id;