Search code examples
pythonpandasdataframeinterpolation

How to interpolate the first and the last values using pandas.DataFrame.interpolate?


I know that this question has been asked before, but the suggested solutions that I found to not work for me. Maybe I am trying to do something that is simply not possible, but let me explain.

I have a time-series data that has some values of 0. I would like to interpolate the zeros in data using pandas.DataFrame.interpolate.

The code:

import pandas as pd
import numpy as np

data = [0, -1.31527, -2.25448, -0.965348, -1.11168, -0.0506046, -0.605522,
        2.01337, 0, 0, 2.41931, 0.821425, 0.402411, 0]

df = pd.DataFrame(data=data) # Data to pandas dataframe
df.replace(to_replace=0, value=np.nan, inplace=True) # Replace 0 by nan
ip = df.interpolate(method="nearest", order=3, limit=None,
                    limit_direction=None)
print(ip)

The result of print(ip):

           0
0        NaN
1  -1.315270
2  -2.254480
3  -0.965348
4  -1.111680
5  -0.050605
6  -0.605522
7   2.013370
8   2.013370
9   2.419310
10  2.419310
11  0.821425
12  0.402411
13       NaN

The problem: Pandas does not interpolate the first and last value of data, but leaves them as zeros. I tried all options of pandas.DataFrame.interpolate out forward and back, but it does not seem to work interpolating the first and last zero of data. Is this simply impossible via Pandas or am I doing something wrong?


Solution

  • What you want is an extrapolation, you need to decide on how to do this.

    You can ffill/bfill:

    ip = (df.interpolate(method="nearest", order=3, limit=None,
                         limit_direction='both')
            .ffill().bfill()
         )
    

    Output:

               0
    0  -1.315270
    1  -1.315270
    2  -2.254480
    3  -0.965348
    4  -1.111680
    5  -0.050605
    6  -0.605522
    7   2.013370
    8   2.013370
    9   2.419310
    10  2.419310
    11  0.821425
    12  0.402411
    13  0.402411
    

    enter image description here

    Or use a spline:

    ip = (df.interpolate(method="nearest", order=3, limit=None,
                         limit_direction=None)
            .fillna(
          df.interpolate(method="spline", order=3, limit=None,
                         limit_direction='both')
            )
         )
    

    Output:

               0
    0  -0.585237
    1  -1.315270
    2  -2.254480
    3  -0.965348
    4  -1.111680
    5  -0.050605
    6  -0.605522
    7   2.013370
    8   2.013370
    9   2.419310
    10  2.419310
    11  0.821425
    12  0.402411
    13 -1.951716
    

    Output:

    enter image description here