Search code examples
pythonpandasnumpyweighted-average

Weighted mean dataframe with pandas


I've come across a bunch of other weighted mean pandas questions but none of them seem to do what I'm trying to do. I have the following df:

      Primary_Key     Team     Quantity   Value 1    Value 2
0         A           Blue      10          20          10
1         B           Red       5           19          30
2         C          Green      8           13          29
3         D           Blue      12          24          18
4         E           Red       15          25          19
5         F          Green      12          18          23

I'm trying to calculate the weighted average of each of the values for each team, so I'd get the following result_df:

        Team     Quantity   Value 1    Value 2
0       Blue      10        20*10/22    10*10/22
1        Red       5        19*5/20     30*5/20
2       Green      8        13*8/20     29*8/20
3        Blue      12       24*12/22    18*12/22
4        Red       15       25*15/20    19*15/20
5       Green      12       18*12/20    23*12/20

where each entry under the Value columns have had the following calculation done on them:

weighted_mean = value * (quantity/team's total quantity)

I'm thinking I'd have to use the .apply(lambda x:...) function somehow but I don't know how I would easily get the values for the team's total quantity. I also came across the numpy.average function but I don't think it would be useful here.

Any help would be much appreciated!


Solution

  • Breaking down into steps:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(data={
        'Primary_Key': list('ABCDEF'),
        'Team': ['Blue', 'Red', 'Green', 'Blue', 'Red', 'Green'],
        'Quantity': [10,5,8,12,15,12],
        'v1': [20,19,13, 24,25,18],
        'v2': [10,30,29,18,19,23]})
    
    df['GroupQuantity'] = df.groupby('Team')['Quantity'].transform(np.sum)
    
    df['v1'] = df['Quantity'] * df['v1'] / df['GroupQuantity']
    df['v2'] = df['Quantity'] * df['v2'] / df['GroupQuantity']
    
    df
    Primary_Key   Team  Quantity         v1         v2
    0           A   Blue        10   9.090909   4.545455
    1           B    Red         5   4.750000   7.500000
    2           C  Green         8   5.200000  11.600000
    3           D   Blue        12  13.090909   9.818182
    4           E    Red        15  18.750000  14.250000
    5           F  Green        12  10.800000  13.800000
    

    Now if you are looking for a one liner, you can do:

    df[['v1', 'v2']] = df[['v1', 'v2']] * df['Quantity'].to_numpy()[:,None] /  df.groupby('Team')['Quantity'].transform(np.sum).to_numpy()[:,None]
    
    df
    Primary_Key   Team  Quantity         v1         v2
    0           A   Blue        10   9.090909   4.545455
    1           B    Red         5   4.750000   7.500000
    2           C  Green         8   5.200000  11.600000
    3           D   Blue        12  13.090909   9.818182
    4           E    Red        15  18.750000  14.250000
    5           F  Green        12  10.800000  13.800000