Search code examples
pythonpandasaggregate

Pandas: apply different functions to different columns


When using df.mean() I get a result where the mean for each column is given. Now let's say I want the mean of the first column, and the sum of the second. Is there a way to do this? I don't want to have to disassemble and reassemble the DataFrame.

My initial idea was to do something along the lines of pandas.groupby.agg() like so:

df = pd.DataFrame(np.random.random((10,2)), columns=['A','B'])
df.apply({'A':np.mean, 'B':np.sum}, axis=0)

Traceback (most recent call last):

  File "<ipython-input-81-265d3e797682>", line 1, in <module>
    df.apply({'A':np.mean, 'B':np.sum}, axis=0)

  File "C:\Users\Patrick\Anaconda\lib\site-packages\pandas\core\frame.py", line 3471, in apply
    return self._apply_standard(f, axis, reduce=reduce)

  File "C:\Users\Patrick\Anaconda\lib\site-packages\pandas\core\frame.py", line 3560, in _apply_standard
    results[i] = func(v)

TypeError: ("'dict' object is not callable", u'occurred at index A')

But clearly this doesn't work. It seems like passing a dict would be an intuitive way of doing this, but is there another way (again without disassembling and reassembling the DataFrame)?


Solution

  • I think you can use the agg method with a dictionary as the argument. For example:

    df = pd.DataFrame({'A': [0, 1, 2], 'B': [3, 4, 5]})
    
    df =
    A   B
    0   0   3
    1   1   4
    2   2   5
    
    df.agg({'A': 'mean', 'B': sum})
    
    A     1.0
    B    12.0
    dtype: float64
    

    To add, it seems the example provided in the question also works now (as of version 1.5.3).

    import numpy as np
    
    df = pd.DataFrame(np.random.random((10,2)), columns=['A','B'])
    df.apply({'A':np.mean, 'B':np.sum}, axis=0)
    
    A    0.495771
    B    5.939556
    dtype: float64