I have a dataframe like this (a1,a2 and a3 are string)
A | B | C |
---|---|---|
a1 | 0.1 | 500 |
a2 | 0.2 | 300 |
a3 | 0.11 | 200 |
a2 | 0.13 | 700 |
a1 | 0.3 | 300 |
I trying to build a new dataframe as a sumprod of B and C based on A values. The output should look something like:
A_values | sumprodif |
---|---|
a1 | 140 |
a2 | 151 |
a3 | 22 |
My initial solution was to use an if formula, create a list and edit the new dataframe from there. However, my if formula does not seem to be working:
if df['A']=="a1":
sumprodif_a1 = (df['B']*df['C']).sum()
return sumproduif_a1
Which returns the following error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I would love to use this as a chance to learn a better way to do this kind of operation.
Thank you!
You can use groupby.sum
on the multiplied values:
out = df['B'].mul(df['C']).groupby(df['A']).sum().reset_index(name='sumprodif')
# or
# out = (df.assign(sumprodif=df['B'].mul(df['C']))
# .groupby('A', as_index=False)['sumprodif'].sum()
# )
Output:
A sumprodif
0 a1 140.0
1 a2 151.0
2 a3 22.0