Why is df.cumsum() giving ValueError: Wrong number of items passed, placement implies 1

I would like to create a new column called total_amount based on the sum of each amount in each group. I would like the final data set to look like the set below.

company | amount | total_amount

company 1 | 10000 | 10000

company 1 | 20000 | 30000

company 1 | 30000 | 60000

company 2 | 10000 | 10000

company 2 | 20000 | 30000

company 3 | 10000 | 10000

company 4 | 10000 | 10000

company 4 | 20000 | 20000

company 5 | 10000 | 10000

company 5 | 20000 | 30000

company 5 | 30000 | 60000

company 5 | 40000 | 100000

I ran this code

 df['total_amount'] = df.groupby('company').cumsum()

and it worked briefly but when I tried to change its position to make my code more readable, it started giving me KeyError "total_amount" and the value error listed above. What am I doing wrong?

Solution

It indicates cumsum returns more than 1 columns. In other words, df.groupby('company').cumsum() is calling cumsum on DataFrameGroupby object, so it returns a dataframe. If the returned dataframe is only 1 column, the assignment still works. However, if the returned dataframe has 2 or more columns, it will failed as your error above. I suspect your first run returns 1-column dataframe, so It worked. However, the 1st run created an additional column. On next runs, it returns n-columns dataframe, so the assignment failed.

Try this to fix your error:

df['total_amount'] = df.groupby('company')['amount'].cumsum()