Im trying to apply cumilative sum for column Amount
only when date is consecutive in Date
column :
Current input :
df = pd.DataFrame({'Country': {0: 'USA',1: 'Canada', 2: 'China',3: 'Egypt',4: 'Poland',5: 'UK',6: 'Jordan'},
'Date': {0: '2021-01-01',1: '2021-01-02',2: '2021-01-03',3: '2021-01-04',4: '2021-01-06',5: '2021-01-07',6: '2021-01-08'},
'Amount': {0: 10, 1: 15, 2: 10, 3: 20, 4: 25, 5: 30, 6: 10}})
Country Date Amount
0 USA 2021-01-01 10
1 Canada 2021-01-02 15
2 China 2021-01-03 10
3 Egypt 2021-01-04 20
4 Poland 2021-01-06 25
5 UK 2021-01-07 30
6 Jordan 2021-01-08 10
Expected output :
In row number 4 the cumilative sum is reset due to missing 2021-01-05 in Date
column.
Country Date Amount Cumilative
0 USA 2021-01-01 10 10
1 Canada 2021-01-02 15 25
2 China 2021-01-03 10 35
3 Egypt 2021-01-04 20 55
4 Poland 2021-01-06 25 25<
5 UK 2021-01-07 30 55
6 Jordan 2021-01-08 10 65
What i have tried, which is incorect :
Im not sure how to include a check in my script to check wheather Date
column is consecutive to reset cumilative sum in Amount
column.
df['Date'] = pd.to_datetime(df['Date'])
df['Cumilative'] = df['Amount'].cumsum()
Country Date Amount Cumilative
0 USA 2021-01-01 10 10
1 Canada 2021-01-02 15 25
2 China 2021-01-03 10 35
3 Egypt 2021-01-04 20 55
4 Poland 2021-01-06 25 80
5 UK 2021-01-07 30 110
6 Jordan 2021-01-08 10 120
Wwould appreciate if someone could help me here.
Try using with groupby
, diff
and cumsum
:
df['Cumilative'] = df.groupby(df['Date'].diff().dt.days.ne(1).cumsum())['Amount'].cumsum()
And now:
print(df)
Outputs:
Country Date Amount Cumilative
0 USA 2021-01-01 10 10
1 Canada 2021-01-02 15 25
2 China 2021-01-03 10 35
3 Egypt 2021-01-04 20 55
4 Poland 2021-01-06 25 25
5 UK 2021-01-07 30 55
6 Jordan 2021-01-08 10 65
This groups the dates into groups of consecutive dates and applies cumsum
to the Amount
column in all of the groups.