Consider the following dataframe:
df = pd.DataFrame([np.nan, np.nan,1, 5,np.nan, 6, 6.1 , np.nan,np.nan])
I would like to use the pandas.DataFrame.interpolate
method to linearly extrapolate the dataframe entries at the starting and ending rows, similar to what I get if I do the following:
from scipy import interpolate
df_num = df.dropna()
xi = df_num.index.values
yi = df_num.values[:,0]
f = interpolate.interp1d(xi, yi, kind='linear', fill_value='extrapolate')
x = [0, 1 , 7, 8]
print(f(x))
[-7. -3. 6.2 6.3]
It seems that the 'linear' option in pandas interpolate
calls numpy's interpolate method which doesn't do linear extrapolation. Is there a way to call the built-in interpolate method to achieve this?
You can use scipy interpolate method directly in pandas. See pandas.DataFrame.interpolate documentation, you can use in method
option techniques from scipy.interpolate.interp1d as it's noted in the attached link.
Solution for your example could look like:
df.interpolate(method="slinear", fill_value="extrapolate", limit_direction="both")
# Out:
# 0
# 0 -7.0
# 1 -3.0
# 2 1.0
# 3 5.0
# 4 5.5
# 5 6.0
# 6 6.1
# 7 6.2
# 8 6.3
You can then easily select any values you are interested in, e.g. df_interpolated.loc[x]
(where df_interpolated
is output of the previous code block) using indexes defined in your question by x
variable.
Explanation:
method="slinear"
- one of the method listed in pandas doc above that is passed to scipy interp1d
(see e.g. this link)fill_value="extrapolate"
- pass any option allowed by scipy (here extrapolate which is exactly what you want)limit_direction="both"
- to get extrapolation in both direction (otherwise default would be set to "forward" in that case and you would see np.nan
for the first two values)