Search code examples
pythonpandasaggregateapply

Pandas: Aggregate and/or apply does not work with user defined function


I am trying to get a weighted average from a dt, but neither apply nor agg seems to work, and my code returns the following error 'numpy.float64' object is not callable

I have the following df

df = pd.DataFrame([['RETIRO', 65, 1, 10.7],

                   ['SAN NICOLAS',116, 1, 23.2],

                   ['RETIRO', 101, 2, 28.7],

                   ['FLORES', 136 , 2, 23.5]],

                  columns=['BARRIO', 'HOGARES', 'COMUNA', 'NSE'])

I define the function

def avg_w(dt):
    return np.average(a = dt.NSE, weights = dt.HOGARES)

and now apply it to my df,

df.loc[:,['COMUNA','NSE','HOGARES']].groupby(['COMUNA']).apply(avg_w(df))

and it returns 'numpy.float64' object is not callable

I tried also something similar to the suggestions found in here and here

I changed the function,

def avg_w2(dt):
    return pd.Series({'avg_w2': np.average(a = dt.NSE, weights = dt.HOGARES)})

and the apply

df.loc[:,['COMUNA','NSE','HOGARES']].groupby(['COMUNA']).apply({'avgw': [avg_w2(dt)]})

But it didn't work either. The code returns TypeError: unhashable type: 'dict'

The function works alone but something is not working when I passed it to apply (or aggregate, I tried with both of them)

I am expecting to obtain for each COMUNA the NSE average weighted by HOGARES.


Solution

  • Seems like what you want is the following:

    df = df.iloc[:, 1:].groupby(by="COMUNA").apply(
            lambda grp : np.average(a=grp['NSE'], weights=grp["HOGARES"])
        )
    

    Which results in the following dataframe:

    COMUNA
    1    18.711050
    2    25.716034
    

    Note: you may use a function instead of the lambda expression to apply it to each group, but you need to pass the function name itself i.e df.apply(avg_w2) NOT df.apply(avg_w2(df))