Search code examples
pythonpandasdataframeaggregation

compute aggregated mean with group by


I have a dataframe data like this:

Cluster VolumePred      ConversionPred
0   0-3     8.0          7.0
1   0-3     175.0       85.0
2   0-3     17           4.0
3   4-6     14           4.0
4   7-9     29.0        19.0

And I need to add a column "meanKPI" which is equal to the sum of "ConversionPred" divided by the sum of "VolumePred" grouped by "Cluster.

I tried with this:

def KPI_Pred_mean(x, y):
    #print (x)
    return (x.sum()/y.sum())
    
    #data.ConversionPred.sum()/sum_vol_pred
    
df3=data.groupby(['Cluster'])['ConversionPred', 'VolumePred'].apply(KPI_Pred_mean).reset_index() 

But I got an error:

TypeError: KPI_Pred_mean() missing 1 required positional argument: 'y'

How can I fix this?


Solution

  • KPI_Pred_mean is expecting two arguments, the way you are giving the function as a lambda to apply can be rewritten as: .apply(lambda x: KPI_Pred_mean(x). Meaning it's missing the y variable. You can rewrite your code in two ways:

    1 - rewrite lambda

    df3=data.groupby(['Cluster'])['ConversionPred', 'VolumePred'].apply(lambda x: KPI_Pred_mean(x["ConversionPred"], x["volumePred"]).reset_index(name = 'KPI_Pred_mean') 
    

    2 - rewrite your function

    def KPI_Pred_mean(row):
        return (row["ConversionPred"].sum()/row["volumePred"].sum())
    

    Number 1 is probably better since it keeps your function nice and generic.