Search code examples
pythonpandasinterpolationnan

Replace nan with zero or linear interpolation


I have a dataset with a lot of NaNs and numeric values with the following form:

                  PV_Power
2017-01-01 00:00:00 NaN
2017-01-01 01:00:00 NaN
2017-01-01 02:00:00 NaN
2017-01-01 03:00:00 NaN
2017-01-01 04:00:00 NaN
... ...
2017-12-31 20:00:00 NaN
2017-12-31 21:00:00 NaN
2017-12-31 22:00:00 NaN
2017-12-31 23:00:00 NaN
2018-01-01 00:00:00 NaN

What I need to do is to replace a NaN value with either 0 if it is between other NaN values or with the result of interpolation if it is between numeric values. Any idea of how can I achieve that?


Solution

  • Use DataFrame.interpolate with limit_area='inside' if need interpolate between numeric values and then replace missing values:

    print (df)
                         PV_Power
    date                         
    2017-01-01 00:00:00       NaN
    2017-01-01 01:00:00       4.0
    2017-01-01 02:00:00       NaN
    2017-01-01 03:00:00       NaN
    2017-01-01 04:00:00       5.0
    2017-01-01 05:00:00       NaN
    2017-01-01 06:00:00       NaN
    
    
    df = df.interpolate(limit_area='inside').fillna(0)
    print (df)
                         PV_Power
    date                         
    2017-01-01 00:00:00  0.000000
    2017-01-01 01:00:00  4.000000
    2017-01-01 02:00:00  4.333333
    2017-01-01 03:00:00  4.666667
    2017-01-01 04:00:00  5.000000
    2017-01-01 05:00:00  0.000000
    2017-01-01 06:00:00  0.000000