I have a data-frame with two columns: X
and Y
. Some of the values in Y
are missing (np.nan
).
I would like to fill the NaNs
using a linear interpolation. In more details, I want to order the data frame by X
and any missing values of Y
should be a "linear mixture" of the two neighbouring values of Y
(one corresponding to smaller X
and another one to larger X
).
If the value of X
corresponding to a missing Y
is closer to one of the two X
with available Y
, then the filled value of Y
should be close to the corresponding Y
. How to do it efficiently and elegantly in pandas?
Please note, that pandas.Series.interpolate
does not do what I need, as far as I understand.
Setting up a dataframe:
x = [0,1,3,4,7,9,11,122,123,128]
y = [2,8,12,np.NaN, 22, 31, 34, np.NaN, 43, 48]
df = pd.DataFrame({"x":x, "y":y})
print(df)
x y
0 0 2.0
1 1 8.0
2 3 12.0
3 4 NaN
4 7 22.0
5 9 31.0
6 11 34.0
7 122 NaN
8 123 43.0
9 128 48.0
Set column 'x' to the index:
df = df.set_index('x')
Then set the method in the interplote to 'index'.
df.y = df.y.interpolate(method='index')
This results in:
df
y
x
0 2.000000
1 8.000000
3 12.000000
4 14.500000
7 22.000000
9 31.000000
11 34.000000
122 42.919643
123 43.000000
128 48.000000