I have following table:
+-----+----------+---+
| Grp | Category | X |
+-----+----------+---+
| 1 | A | 1 |
| 1 | B | 3 |
| 1 | B | 2 |
| 1 | C | 2 |
| 2 | A | 2 |
| 2 | A | 4 |
| 2 | B | 4 |
| 3 | A | 3 |
| 3 | C | 7 |
+-----+----------+---+
And trying to get following:
+-----+----------+---------+
| Grp | Category | X_ratio |
+-----+----------+---------+
| 1 | A | 1/8 |
| 1 | B | 5/8 |
| 1 | C | 2/8 |
| 2 | A | 6/10 |
| 2 | B | 4/10 |
| 3 | A | 3/10 |
| 3 | C | 7/10 |
+-----+----------+---------+
And a little bit stuck. Coud anyone suggest efficient solution?
My current code - its working, but seems not efficient:
grp_Cat = df.groupby(['Grp', 'Category ']).agg({'X': 'sum'})
grp_total = df.groupby(['Grp']).agg({'X': 'sum'})
grp_Cat.div(grp_total, level='Grp') * 100
Because performance is important first aggregate sum
to MultiIndex Series
and then divide by Series.div
summed values per first Grp
level:
s = df.groupby(['Grp','Category'])['X'].sum()
df = s.div(s.sum(level=0), level=0).reset_index(name='X_ratio')
print (df)
Grp Category X_ratio
0 1 A 0.125
1 1 B 0.625
2 1 C 0.250
3 2 A 0.600
4 2 B 0.400
5 3 A 0.300
6 3 C 0.700
Slowier alternative:
df = (df.groupby(['Grp','Category'])['X'].sum()
.groupby(level=0)
.apply(lambda x: x / x.sum())
.reset_index(name='X_ratio'))
print (df)
Grp Category X_ratio
0 1 A 0.125
1 1 B 0.625
2 1 C 0.250
3 2 A 0.600
4 2 B 0.400
5 3 A 0.300
6 3 C 0.700