I would like to create a new column called total_amount based on the sum of each amount in each group. I would like the final data set to look like the set below.
company | amount | total_amount
company 1 | 10000 | 10000
company 1 | 20000 | 30000
company 1 | 30000 | 60000
company 2 | 10000 | 10000
company 2 | 20000 | 30000
company 3 | 10000 | 10000
company 4 | 10000 | 10000
company 4 | 20000 | 20000
company 5 | 10000 | 10000
company 5 | 20000 | 30000
company 5 | 30000 | 60000
company 5 | 40000 | 100000
I ran this code
df['total_amount'] = df.groupby('company').cumsum()
and it worked briefly but when I tried to change its position to make my code more readable, it started giving me KeyError "total_amount" and the value error listed above. What am I doing wrong?
It indicates cumsum
returns more than 1 columns. In other words, df.groupby('company').cumsum()
is calling cumsum
on DataFrameGroupby
object, so it returns a dataframe. If the returned dataframe is only 1 column, the assignment still works. However, if the returned dataframe has 2 or more columns, it will failed as your error above. I suspect your first run returns 1-column dataframe, so It worked. However, the 1st run created an additional column. On next runs, it returns n-columns dataframe, so the assignment failed.
Try this to fix your error:
df['total_amount'] = df.groupby('company')['amount'].cumsum()