I have a dataframe like that:
df = pd.DataFrame({'year': [2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022],
'month': [1,2,3,1,2,3,4,5,6,7,8,9,10,11,1,2,3,4,5],
'client':[1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3],
'total':[10,20,30,55,4,64,88,5,64,32,84,24,69,70,54,11,37,98,52]})
df
year month client total
0 2022 1 1 10
1 2022 2 1 20
2 2022 3 1 30
3 2022 1 2 55
4 2022 2 2 4
5 2022 3 2 64
6 2022 4 2 88
7 2022 5 2 5
8 2022 6 2 64
9 2022 7 2 32
10 2022 8 2 84
11 2022 9 2 24
12 2022 10 2 69
13 2022 11 2 70
14 2022 1 3 54
15 2022 2 3 11
16 2022 3 3 37
17 2022 4 3 98
18 2022 5 3 52
I would like to have for all the clients 12 months, so I need to add this rows and assign the value 0 to the column total. For this new rows, the value of the year column is a copy of the rows that have filled the month up.
Desire output:
df = pd.DataFrame({'year': [2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,20
22,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022],
'month': [1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12],
'client':[1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3],
'total':
[10,20,30,0,0,0,0,0,0,0,0,0,55,4,64,88,5,64,32,84,24,69,70,0,54,11,37,98,52,0,0,0,0,0,0,0]})
df
year month client total
0 2022 1 1 10
1 2022 2 1 20
2 2022 3 1 30
3 2022 4 1 0
4 2022 5 1 0
5 2022 6 1 0
6 2022 7 1 0
7 2022 8 1 0
8 2022 9 1 0
9 2022 10 1 0
10 2022 11 1 0
11 2022 12 1 0
12 2022 1 2 55
13 2022 2 2 4
14 2022 3 2 64
15 2022 4 2 88
16 2022 5 2 5
17 2022 6 2 64
18 2022 7 2 32
19 2022 8 2 84
20 2022 9 2 24
21 2022 10 2 69
22 2022 11 2 70
23 2022 12 2 0
24 2022 1 3 54
25 2022 2 3 11
26 2022 3 3 37
27 2022 4 3 98
28 2022 5 3 52
29 2022 6 3 0
30 2022 7 3 0
31 2022 8 3 0
32 2022 9 3 0
33 2022 10 3 0
34 2022 11 3 0
35 2022 12 3 0
Maybe by the merge option there is a solution? I've tried but without a good result
It's not the easiest way but it works with pivot_table
:
out = (df.pivot_table(index=['year', 'client'], columns='month',
values='total', fill_value=0.)
.reindex(columns=range(1, 13), fill_value=0.)
.stack().swaplevel('month', 'client')
.rename('total').reset_index())
Output:
>>> out
year month client total
0 2022 1 1 10.0
1 2022 2 1 20.0
2 2022 3 1 30.0
3 2022 4 1 0.0
4 2022 5 1 0.0
5 2022 6 1 0.0
6 2022 7 1 0.0
7 2022 8 1 0.0
8 2022 9 1 0.0
9 2022 10 1 0.0
10 2022 11 1 0.0
11 2022 12 1 0.0
12 2022 1 2 55.0
13 2022 2 2 4.0
14 2022 3 2 64.0
15 2022 4 2 88.0
16 2022 5 2 5.0
17 2022 6 2 64.0
18 2022 7 2 32.0
19 2022 8 2 84.0
20 2022 9 2 24.0
21 2022 10 2 69.0
22 2022 11 2 70.0
23 2022 12 2 0.0
24 2022 1 3 54.0
25 2022 2 3 11.0
26 2022 3 3 37.0
27 2022 4 3 98.0
28 2022 5 3 52.0
29 2022 6 3 0.0
30 2022 7 3 0.0
31 2022 8 3 0.0
32 2022 9 3 0.0
33 2022 10 3 0.0
34 2022 11 3 0.0
35 2022 12 3 0.0
After the pivot your dataframe looks like:
>>> (df.pivot_table(index=['year', 'client'], columns='month',
values='total', fill_value=0)
.reindex(columns=range(1, 13), fill_value=0))
month 1 2 3 4 5 6 7 8 9 10 11 12
year client
2022 1 10.0 20.0 30.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
2 55.0 4.0 64.0 88.0 5.0 64.0 32.0 84.0 24.0 69.0 70.0 0
3 54.0 11.0 37.0 98.0 52.0 0.0 0.0 0.0 0.0 0.0 0.0 0