Search code examples
pythonpandasdataframegroup-bycumulative-sum

How can I convert cumulative values to incremental values to a pandas dataframe with null values?


I have a pandas dataframe which has cumulative values like this:

import pandas as pd
import numpy as np

data = {'Date':['2017-01-27','2017-01-28','2017-01-29','2017-01-30','2017-01-31','2017-02-01','2017-02-02','2017-02-03','2017-02-04','2017-02-05'],
'Kentucky':[1,4,5,7,np.NaN,np.NaN,10,np.NaN,12,13],
'New York':[np.NaN,2,np.NaN,7,np.NaN,np.NaN,np.NaN,np.NaN,9,np.NaN],
'California':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,1,np.NaN,2,np.NaN],
'Maine':[np.NaN,2,3,7,np.NaN,np.NaN,10,np.NaN,12,13]}

df = pd.DataFrame(data)

enter image description here

How can I convert the cumulative values to incremental values if the dataframe has null values? I have tried incremental values conversion solution but it does not produce the desired result.


Solution

  • First convert to a data frame:

    data = pd.DataFrame(data)
    

    Next use fillna's forward fill to account for the NaN issue. Below shows the code for Maine, but could be applied to the other states:

    data['Maine_Inc_Values'] = data['Maine'].fillna(method='ffill').diff()
    

    Then use .loc to correct for the first incremental value (which returns a NaN and not 2):

    data.loc[data['Maine_Inc_Values'].isnull(), 'Maine_Inc_Values'] = data['Maine']
    

    Results in:

    enter image description here