Search code examples
pythonpandasdataframepandas-groupbycumsum

Cumilative sum for only consecutive date


Im trying to apply cumilative sum for column Amount only when date is consecutive in Date column :

Current input :

df = pd.DataFrame({'Country': {0: 'USA',1: 'Canada', 2: 'China',3: 'Egypt',4: 'Poland',5: 'UK',6: 'Jordan'},
                   'Date': {0: '2021-01-01',1: '2021-01-02',2: '2021-01-03',3: '2021-01-04',4: '2021-01-06',5: '2021-01-07',6: '2021-01-08'},
                   'Amount': {0: 10, 1: 15, 2: 10, 3: 20, 4: 25, 5: 30, 6: 10}})

    Country  Date        Amount
0   USA      2021-01-01  10
1   Canada   2021-01-02  15
2   China    2021-01-03  10
3   Egypt    2021-01-04  20
4   Poland   2021-01-06  25
5   UK       2021-01-07  30
6   Jordan   2021-01-08  10

Expected output :

In row number 4 the cumilative sum is reset due to missing 2021-01-05 in Date column.

    Country  Date        Amount Cumilative
0   USA      2021-01-01  10     10
1   Canada   2021-01-02  15     25
2   China    2021-01-03  10     35
3   Egypt    2021-01-04  20     55
4   Poland   2021-01-06  25     25<
5   UK       2021-01-07  30     55
6   Jordan   2021-01-08  10     65

What i have tried, which is incorect :

Im not sure how to include a check in my script to check wheather Date column is consecutive to reset cumilative sum in Amount column.

df['Date'] = pd.to_datetime(df['Date'])
df['Cumilative'] = df['Amount'].cumsum()


    Country Date        Amount  Cumilative
0   USA     2021-01-01  10      10
1   Canada  2021-01-02  15      25
2   China   2021-01-03  10      35
3   Egypt   2021-01-04  20      55
4   Poland  2021-01-06  25      80
5   UK      2021-01-07  30      110
6   Jordan  2021-01-08  10      120

Wwould appreciate if someone could help me here.


Solution

  • Try using with groupby, diff and cumsum:

    df['Cumilative'] = df.groupby(df['Date'].diff().dt.days.ne(1).cumsum())['Amount'].cumsum()
    

    And now:

    print(df)
    

    Outputs:

      Country       Date  Amount  Cumilative
    0     USA 2021-01-01      10          10
    1  Canada 2021-01-02      15          25
    2   China 2021-01-03      10          35
    3   Egypt 2021-01-04      20          55
    4  Poland 2021-01-06      25          25
    5      UK 2021-01-07      30          55
    6  Jordan 2021-01-08      10          65
    

    This groups the dates into groups of consecutive dates and applies cumsum to the Amount column in all of the groups.