Search code examples
pythonpandasinterpolation

How to do a linear interpolation in pandas taking values of X into account?


I have a data-frame with two columns: X and Y. Some of the values in Y are missing (np.nan).

I would like to fill the NaNs using a linear interpolation. In more details, I want to order the data frame by X and any missing values of Y should be a "linear mixture" of the two neighbouring values of Y (one corresponding to smaller X and another one to larger X).

If the value of X corresponding to a missing Y is closer to one of the two X with available Y, then the filled value of Y should be close to the corresponding Y. How to do it efficiently and elegantly in pandas?

Please note, that pandas.Series.interpolate does not do what I need, as far as I understand.


Solution

  • Setting up a dataframe:

    x = [0,1,3,4,7,9,11,122,123,128]
    y = [2,8,12,np.NaN, 22, 31, 34, np.NaN, 43, 48]
    
    df = pd.DataFrame({"x":x, "y":y})
    print(df)
    
         x     y
    0    0   2.0
    1    1   8.0
    2    3  12.0
    3    4   NaN
    4    7  22.0
    5    9  31.0
    6   11  34.0
    7  122   NaN
    8  123  43.0
    9  128  48.0
    

    Set column 'x' to the index:

    df = df.set_index('x')
    

    Then set the method in the interplote to 'index'.

    df.y = df.y.interpolate(method='index')
    

    This results in:

    df
    
            y
    x   
    0      2.000000
    1      8.000000
    3     12.000000
    4     14.500000
    7     22.000000
    9     31.000000
    11    34.000000
    122   42.919643
    123   43.000000
    128   48.000000