Let's say I have the following dataframe. The last column is what I need and rest columns I have.Percentile within category is calculated as the weighted percentile of price with weights as the number of items sold within the category
| Category | Price | Items Sold | Percentile within category|
|:-----------|------------:|:------------:|:--------------------------:|
| A | 560 | 5 | 92.56 |
| A | 360 | 2 | 12.56 |
| B | 510 | 3 | 42.56 |
| A | 520 | 4 | 72.36 |
| B | 960 | 6 | 91.56 |
| C | 130 | 2 | 100.00 |
The function I need to use is stats.percentileofscore. But I am not sure how to use it.
Edit: Inserted image of the dataframe as not sure how to show a table
Edit2: I haven't exactly calculated the output values for all rows. For A-560 it should be 81.81% as
stats.percentileofscore([560,560,560,560,560,360,360,520,520,520,520], 560)
gives 81.81%
You can do this with a simple groupby and applying a function to get the weighted values
data = {'Category' : ['A', 'A', 'B', 'A', 'B', 'C'],
'Price' : [560, 360, 510, 520, 960, 130],
'Items' : [5, 2, 3, 4, 6, 2]}
df = pd.DataFrame(data).sort_values('Category')
def fun(x):
t = (x['Price'] * x['Items']).sum()
return (x['Price'] * x['Items'])/t
df['weighted'] = df.groupby('Category').apply(fun).values
There must be a better way, as this depends on the sort order being correct. Maybe someone will jump in and give a better solution.
Result:
Category Price Items weighted
0 A 560 5 0.500000
1 A 360 2 0.128571
3 A 520 4 0.371429
2 B 510 3 0.209877
4 B 960 6 0.790123
5 C 130 2 1.000000