Search code examples
pandasdataframeloopsdateincrement

Create an incremental count from a cumulative count by date segmented by another series in a Pandas data frame


I have cumulative data (the series 'cumulative_count') in a date frame ('df1') that is segmented by the series 'state' and I want to create a new series in the data frame that shows incremental count by 'state'.

So:

df1 = pd.DataFrame({'date': ['2020-01-03','2020-01-03','2020-01-03','2020-01-04','2020-01-04','2020-01-04','2020-01-05','2020-01-05','2020-01-05'],'state': ['NJ','NY','CT','NJ','NY','CT','NJ','NY','CT'], 'cumulative_count': [1,3,5,3,6,7,19,15,20]})

...is transformed to have the new series added ('incremental count') where the incremental count is calculated by date but also segmented by state with the result generated being...

df2 = pd.DataFrame({'date': ['2020-01-03','2020-01-03','2020-01-03','2020-01-04','2020-01-04','2020-01-04','2020-01-05','2020-01-05','2020-01-05'],'state': ['NJ','NY','CT','NJ','NY','CT','NJ','NY','CT'], 'cumulative_count': [1,3,5,3,6,7,19,15,20],'incremental_count': [1,3,5,2,3,2,16,9,13]})

Any recommendations on how to do this would be greatly appreciated. Thanks!


Solution

  • Since your DataFrame is already sorted by 'date', you are looking to take the diff within each state group. Then fillna to get the correct value for the first date within each state.

    df1['incremental_count'] = (df1.groupby('state')['cumulative_count'].diff()
                                   .fillna(df1['cumulative_count'], downcast='infer'))
    
             date state  cumulative_count  incremental_count
    0  2020-01-03    NJ                 1                  1
    1  2020-01-03    NY                 3                  3
    2  2020-01-03    CT                 5                  5
    3  2020-01-04    NJ                 3                  2
    4  2020-01-04    NY                 6                  3
    5  2020-01-04    CT                 7                  2
    6  2020-01-05    NJ                19                 16
    7  2020-01-05    NY                15                  9
    8  2020-01-05    CT                20                 13