I have spent a few hours now trying to do a "cumulative group by sum" on a pandas dataframe. I have looked at all the stackoverflow answers and surprisingly none of them can solve my (very elementary) problem:
I have a dataframe:
df1
Out[8]:
Name Date Amount
0 Jack 2016-01-31 10
1 Jack 2016-02-29 5
2 Jack 2016-02-29 8
3 Jill 2016-01-31 10
4 Jill 2016-02-29 5
I am trying to
So the desired output is:
df1
Out[10]:
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 23
2 Jill 2016-01-31 10
3 Jill 2016-02-29 15
EDIT: I am simplifying the question. With the current answers I still can't get the correct "running" cumsum. Look closely, I want to see the cumulative sum "10, 23, 10, 15". In words, I want to see, at every consecutive date, the total cumulative sum for a person. NB: If there are two entries on one date for the same person, I want to sum those and then add them to the running cumsum and only then print the sum.
You need assign output to new column and then remove Amount
column by drop
:
df1['Cumsum'] = df1.groupby(by=['Name','Date'])['Amount'].cumsum()
df1 = df1.drop('Amount', axis=1)
print (df1)
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 5
2 Jack 2016-02-29 13
3 Jill 2016-01-31 10
4 Jill 2016-02-29 5
Another solution with assign
:
df1 = df1.assign(Cumsum=df1.groupby(by=['Name','Date'])['Amount'].cumsum())
.drop('Amount', axis=1)
print (df1)
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 5
2 Jack 2016-02-29 13
3 Jill 2016-01-31 10
4 Jill 2016-02-29 5
EDIT by comment:
First groupby
columns Name
and Date
and aggregate sum
, then groupby
by level
Name
and aggregate cumsum
.
df = df1.groupby(by=['Name','Date'])['Amount'].sum()
.groupby(level='Name').cumsum().reset_index(name='Cumsum')
print (df)
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 23
2 Jill 2016-01-31 10
3 Jill 2016-02-29 15