I have a dataframe data like this:
Cluster VolumePred ConversionPred
0 0-3 8.0 7.0
1 0-3 175.0 85.0
2 0-3 17 4.0
3 4-6 14 4.0
4 7-9 29.0 19.0
And I need to add a column "meanKPI" which is equal to the sum of "ConversionPred" divided by the sum of "VolumePred" grouped by "Cluster.
I tried with this:
def KPI_Pred_mean(x, y):
#print (x)
return (x.sum()/y.sum())
#data.ConversionPred.sum()/sum_vol_pred
df3=data.groupby(['Cluster'])['ConversionPred', 'VolumePred'].apply(KPI_Pred_mean).reset_index()
But I got an error:
TypeError: KPI_Pred_mean() missing 1 required positional argument: 'y'
How can I fix this?
KPI_Pred_mean
is expecting two arguments, the way you are giving the function as a lambda to apply
can be rewritten as: .apply(lambda x: KPI_Pred_mean(x)
. Meaning it's missing the y
variable. You can rewrite your code in two ways:
df3=data.groupby(['Cluster'])['ConversionPred', 'VolumePred'].apply(lambda x: KPI_Pred_mean(x["ConversionPred"], x["volumePred"]).reset_index(name = 'KPI_Pred_mean')
def KPI_Pred_mean(row):
return (row["ConversionPred"].sum()/row["volumePred"].sum())
Number 1 is probably better since it keeps your function nice and generic.