Search code examples
pandasinterpolationextrapolation

linearly extrapolate pandas dataframe using built-in interpolate method


Consider the following dataframe:

df = pd.DataFrame([np.nan, np.nan,1, 5,np.nan, 6, 6.1 , np.nan,np.nan])

I would like to use the pandas.DataFrame.interpolate method to linearly extrapolate the dataframe entries at the starting and ending rows, similar to what I get if I do the following:

from scipy import interpolate
df_num = df.dropna()
xi = df_num.index.values
yi = df_num.values[:,0]
f = interpolate.interp1d(xi, yi, kind='linear', fill_value='extrapolate')
x = [0, 1 , 7, 8]
print(f(x))
[-7.  -3. 6.2 6.3]

It seems that the 'linear' option in pandas interpolate calls numpy's interpolate method which doesn't do linear extrapolation. Is there a way to call the built-in interpolate method to achieve this?


Solution

  • You can use scipy interpolate method directly in pandas. See pandas.DataFrame.interpolate documentation, you can use in method option techniques from scipy.interpolate.interp1d as it's noted in the attached link.

    Solution for your example could look like:

    df.interpolate(method="slinear", fill_value="extrapolate", limit_direction="both")
    
    # Out: 
    #      0
    # 0 -7.0
    # 1 -3.0
    # 2  1.0
    # 3  5.0
    # 4  5.5
    # 5  6.0
    # 6  6.1
    # 7  6.2
    # 8  6.3
    

    You can then easily select any values you are interested in, e.g. df_interpolated.loc[x] (where df_interpolated is output of the previous code block) using indexes defined in your question by x variable.

    Explanation:

    • method="slinear" - one of the method listed in pandas doc above that is passed to scipy interp1d (see e.g. this link)
    • fill_value="extrapolate" - pass any option allowed by scipy (here extrapolate which is exactly what you want)
    • limit_direction="both" - to get extrapolation in both direction (otherwise default would be set to "forward" in that case and you would see np.nan for the first two values)