Search code examples
pandaspandas-groupbysklearn-pandaspandasql

Pandas - How to make a groupment in which a new column is the result of (sum of a column)/(number of itens grouped)?


I need to make some kind of groupment in which a new column (result) is the sum of the values column divided by number of items found? Could anyone help me, please?

For example:

Table A

+-------+------+
| item  | value|
+-------+------+
| x     |  100 |
| y     |  200 |
| y     |  400 | 
+-------+------+

Correct Result:

Table B

+-------+-----------+
| item  | result    |
+-------+-----------+
| x     | 100/1     |
| y     |(200+400)/2|
+-------+-----------+

Code:

d = {'item': ['x', 'y', 'y'], 'value': [100,200,400]}
df = pd.DataFrame(data=d)
df

Solution

  • You can use DataFrameGroupBy.agg:

    s = df.groupby('item')['value'].agg(lambda x: x.sum()/len(x)) 
    print (s)
    item
    x    100
    y    300
    Name: value, dtype: int64
    

    Or divide GroupBy.sum with GroupBy.size:

    g = df.groupby('item')['value']
    s = g.sum() / g.size()
    print (s)
    item
    x    100.0
    y    300.0
    Name: value, dtype: float64
    

    But sum/size is mean, so solution should be simplify by GroupBy.mean:

    s = df.groupby('item')['value'].mean()
    print (s)
    item
    x    100
    y    300
    Name: value, dtype: int64