I am struggling with the following.
This is my data frame. The last two columns, are calculations which I am able to create in my query.
Day Month Result Vol FirstCol Second Col
0 26 5 Good 123 1% 0%
1 26 5 Bad 716 0% 2%
2 26 5 Other 36 0% 0%
3 26 6 Good 4721 26% 11%
4 26 6 Bad 7148 0% 16%
5 26 6 Other 1387 0% 3%
6 27 5 Good 196 1% 0%
7 27 5 Bad 627 0% 1%
8 27 5 Other 60 0% 0%
9 27 6 Good 6188 34% 14%
10 27 6 Bad 6688 0% 15%
11 27 6 Other 1068 0% 2%
12 28 5 Good 339 2% 1%
13 28 5 Bad 1114 0% 3%
14 28 5 Other 72 0% 0%
15 28 6 Good 6524 36% 15%
16 28 6 Bad 6103 0% 14%
17 28 6 Other 820 0% 2%
First column to calculate the percentage for all Good Result
df['FirstCol'] = np.where(df['Result'].isin(['Good']),df['Vol']/df[df['Result']=='Good']['Vol'].sum(),0)
Second column to calculate the percentage for all Result
df['SecondCol'] = df['Vol']/df['Vol'].sum()
For the other the code has to be more dynamic, which I am struggling with. The Third Column should get the % based on each Month. So the percentage for row 0-8 should be 100% and the same for row 9-17. The Forth Column should get the % based on each Day and Month. So the percentage for row 0-2 should be 100% and the same for row 3-5 and so on. I want to have a dynamic query. Because I don't want to change each month.
Desired Output
Day Month Result Vol FirstCol Second Col 0-17 Third Col 1-9 (Month) Forth Col 1-3 (Day)
0 26 5 Good 123 1% 0% 1% 14%
1 26 5 Bad 716 0% 2% 5% 82%
2 26 5 Other 36 0% 0% 0% 4%
3 26 6 Good 4721 26% 11% 31% 36%
4 26 6 Bad 7148 0% 16% 48% 54%
5 26 6 Other 1387 0% 3% 9% 10%
6 27 5 Good 196 1% 0% 1% 22%
7 27 5 Bad 627 0% 1% 4% 71%
8 27 5 Other 60 0% 0% 0% 7%
9 27 6 Good 6188 34% 14% 21% 44%
10 27 6 Bad 6688 0% 15% 23% 48%
11 27 6 Other 1068 0% 2% 4% 8%
12 28 5 Good 339 2% 1% 1% 22%
13 28 5 Bad 1114 0% 3% 4% 73%
14 28 5 Other 72 0% 0% 0% 5%
15 28 6 Good 6524 36% 15% 23% 49%
16 28 6 Bad 6103 0% 14% 21% 45%
17 28 6 Other 820 0% 2% 3% 6%
If I understand you correctly, you want to groupby by multiple columns:
# sort the dataframe to have nicer output:
df = df.sort_values(by=['Month', 'Day'])
df['Third Col'] = df.groupby('Month')['Vol'].transform(lambda x: (x / x.sum()) *100)
df['Fourth Col'] = df.groupby(['Day', 'Month'])['Vol'].transform(lambda x: (x / x.sum())*100)
print(df)
Prints:
Day Month Result Vol FirstCol Second Col Third Col Fourth Col
0 26 5 Good 123 1% 0% 3.746573 14.057143
1 26 5 Bad 716 0% 2% 21.809321 81.828571
2 26 5 Other 36 0% 0% 1.096558 4.114286
6 27 5 Good 196 1% 0% 5.970149 22.197055
7 27 5 Bad 627 0% 1% 19.098386 71.007928
8 27 5 Other 60 0% 0% 1.827597 6.795017
12 28 5 Good 339 2% 1% 10.325921 22.229508
13 28 5 Bad 1114 0% 3% 33.932379 73.049180
14 28 5 Other 72 0% 0% 2.193116 4.721311
3 26 6 Good 4721 26% 11% 11.614633 35.614062
4 26 6 Bad 7148 0% 16% 17.585554 53.922752
5 26 6 Other 1387 0% 3% 3.412306 10.463186
9 27 6 Good 6188 34% 14% 15.223756 44.377510
10 27 6 Bad 6688 0% 15% 16.453859 47.963282
11 27 6 Other 1068 0% 2% 2.627500 7.659208
15 28 6 Good 6524 36% 15% 16.050385 48.516398
16 28 6 Bad 6103 0% 14% 15.014638 45.385588
17 28 6 Other 820 0% 2% 2.017369 6.098014