Search code examples
pythonpandas-groupbyapplypandas-apply

Pandas apply on groupby-frames and on full dataframe


Given a Pandas Dataframe I evaluate several variables via groupby expressions applying a customized function. Works fine (ignoring the second 0-index-column for the moment), but I would also like to apply the function to the full DataFrame.

xxx = pd.DataFrame([['A',1],['A',2],['B',3]],columns=(['cls','val']))
xxx

    cls val
0   A   1
1   A   2
2   B   3

def myagg(dat):
    vmax=dat.val.max()
    vmean=dat.val.mean()
    return pd.DataFrame([[vmax,vmean]],columns=(['MaxV','MeanV']))

xxx.groupby('cls').apply(myagg)

yields

        MaxV    MeanV
cls         
A   0   2   1.5
B   0   3   3.0

But xxx.apply(myagg) throws:

AttributeError: ("'Series' object has no attribute 'val'", 'occurred at index cls')

I can create a constant dummy Variable and group by it to receive the result I wish - but there surely will be simpler ways to do it. Why does pandas think of the frame without groupby as a series, if type(xxx) returns pandas.core.frame.DataFrame? I'm on pandas 0.23.4; python 3.6.

xxx['dummy']='test'
xxx.groupby('dummy').apply(myagg)


         MaxV   MeanV
dummy           
test    0   3   2.0

Solution

  • It seems using a dummy function does the trick.

    def dummy(dat):
        return 1
    
    xxx.groupby(dummy).apply(myagg)
    

    and the result is as in the question. No need to modify the dataframe.