I have cumulative data (the series 'cumulative_count') in a date frame ('df1') that is segmented by the series 'state' and I want to create a new series in the data frame that shows incremental count by 'state'.
So:
df1 = pd.DataFrame({'date': ['2020-01-03','2020-01-03','2020-01-03','2020-01-04','2020-01-04','2020-01-04','2020-01-05','2020-01-05','2020-01-05'],'state': ['NJ','NY','CT','NJ','NY','CT','NJ','NY','CT'], 'cumulative_count': [1,3,5,3,6,7,19,15,20]})
...is transformed to have the new series added ('incremental count') where the incremental count is calculated by date but also segmented by state with the result generated being...
df2 = pd.DataFrame({'date': ['2020-01-03','2020-01-03','2020-01-03','2020-01-04','2020-01-04','2020-01-04','2020-01-05','2020-01-05','2020-01-05'],'state': ['NJ','NY','CT','NJ','NY','CT','NJ','NY','CT'], 'cumulative_count': [1,3,5,3,6,7,19,15,20],'incremental_count': [1,3,5,2,3,2,16,9,13]})
Any recommendations on how to do this would be greatly appreciated. Thanks!
Since your DataFrame is already sorted by 'date'
, you are looking to take the diff
within each state group. Then fillna
to get the correct value for the first date within each state.
df1['incremental_count'] = (df1.groupby('state')['cumulative_count'].diff()
.fillna(df1['cumulative_count'], downcast='infer'))
date state cumulative_count incremental_count
0 2020-01-03 NJ 1 1
1 2020-01-03 NY 3 3
2 2020-01-03 CT 5 5
3 2020-01-04 NJ 3 2
4 2020-01-04 NY 6 3
5 2020-01-04 CT 7 2
6 2020-01-05 NJ 19 16
7 2020-01-05 NY 15 9
8 2020-01-05 CT 20 13