I need to make some kind of groupment in which a new column (result) is the sum of the values column divided by number of items found? Could anyone help me, please?
For example:
Table A
+-------+------+
| item | value|
+-------+------+
| x | 100 |
| y | 200 |
| y | 400 |
+-------+------+
Correct Result:
Table B
+-------+-----------+
| item | result |
+-------+-----------+
| x | 100/1 |
| y |(200+400)/2|
+-------+-----------+
Code:
d = {'item': ['x', 'y', 'y'], 'value': [100,200,400]}
df = pd.DataFrame(data=d)
df
You can use DataFrameGroupBy.agg
:
s = df.groupby('item')['value'].agg(lambda x: x.sum()/len(x))
print (s)
item
x 100
y 300
Name: value, dtype: int64
Or divide GroupBy.sum
with GroupBy.size
:
g = df.groupby('item')['value']
s = g.sum() / g.size()
print (s)
item
x 100.0
y 300.0
Name: value, dtype: float64
But sum/size
is mean
, so solution should be simplify by GroupBy.mean
:
s = df.groupby('item')['value'].mean()
print (s)
item
x 100
y 300
Name: value, dtype: int64