Search code examples
pythonpandasgroup-bypercentile

How do I find weighted percentiles for each row within a group in Python?


Let's say I have the following dataframe. The last column is what I need and rest columns I have.Percentile within category is calculated as the weighted percentile of price with weights as the number of items sold within the category

| Category   |    Price    |  Items Sold  |  Percentile within category|
|:-----------|------------:|:------------:|:--------------------------:|
|     A      |     560     |      5       |      92.56                 |
|     A      |     360     |      2       |      12.56                 |
|     B      |     510     |      3       |      42.56                 |
|     A      |     520     |      4       |      72.36                 |
|     B      |     960     |      6       |      91.56                 |
|     C      |     130     |      2       |      100.00                |

The function I need to use is stats.percentileofscore. But I am not sure how to use it.

Edit: Inserted image of the dataframe as not sure how to show a table

Edit2: I haven't exactly calculated the output values for all rows. For A-560 it should be 81.81% as

stats.percentileofscore([560,560,560,560,560,360,360,520,520,520,520], 560)

gives 81.81%

The dataframe


Solution

  • You can do this with a simple groupby and applying a function to get the weighted values

    data = {'Category' : ['A', 'A', 'B', 'A', 'B', 'C'],
        'Price' : [560, 360, 510, 520, 960, 130],
        'Items' : [5, 2, 3, 4, 6, 2]}
    
    df = pd.DataFrame(data).sort_values('Category')
    def fun(x):
        t = (x['Price'] * x['Items']).sum()
        return (x['Price'] * x['Items'])/t
    df['weighted'] = df.groupby('Category').apply(fun).values
    

    There must be a better way, as this depends on the sort order being correct. Maybe someone will jump in and give a better solution.

    Result:

      Category  Price  Items  weighted
    0        A    560      5  0.500000
    1        A    360      2  0.128571
    3        A    520      4  0.371429
    2        B    510      3  0.209877
    4        B    960      6  0.790123
    5        C    130      2  1.000000