Search code examples
pythonpandasdataframefind

Find the nearest value in dataframe column and calculate the difference for each row


Even though I was googling a lot, I couldn't find the solution for my problem.

I have dataframe

    filter10    REF
0   NaN         0.00
1   NaN         0.75
2   NaN         1.50
3   NaN         2.25
4   NaN         3.00
5   NaN         3.75
6   NaN         4.50
...
15  2.804688    11.25
16  3.021875    12.00
17  3.578125    12.75
18  3.779688    13.50
...
27  NaN         20.25
28  NaN         21.00
29  NaN         21.75
30  NaN         22.50
31  6.746875    NaN
32  NaN         NaN
...

I would like now to add the column df['DIFF'] where function goes through whole column filter10 and when it is the number it finds closest number in REF column.

And afterwards calculate the difference between them and put it the same row as number in filter10 is.

I would like this output:

    filter10    REF     DIFF
0   NaN         0.00    NaN
1   NaN         0.75    NaN
2   NaN         1.50    NaN
3   NaN         2.25    NaN
4   NaN         3.00    NaN
5   NaN         3.75    NaN
6   NaN         4.50    NaN
...
15  2.804688    11.25   -0.195312 # 2.804688 - 3 (find closest value in REF) = -0.195312
16  3.021875    12.00   0.021875
17  3.578125    12.75   -0.171875
18  3.779688    13.50   0.029688
...
27  NaN         20.25   NaN
28  NaN         21.00   NaN
29  NaN         21.75   NaN
30  NaN         22.50   NaN
31  6.746875    NaN     -0.003125
32  NaN         NaN     NaN
...

Solution

  • Use pandas.merge_asof to find the nearest value:

    df['DIFF'] = (pd.merge_asof(df['filter10'].dropna().sort_values().reset_index(),
                                df[['REF']].dropna().sort_values('REF'),
                                left_on='filter10', right_on='REF', direction='nearest')
                    .set_index('index')['REF'].rsub(df['filter10'])
                  )
    

    Output:

        filter10    REF      DIFF
    0        NaN   0.00       NaN
    1        NaN   0.75       NaN
    2        NaN   1.50       NaN
    3        NaN   2.25       NaN
    4        NaN   3.00       NaN
    5        NaN   3.75       NaN
    6        NaN   4.50       NaN
    15  2.804688  11.25 -0.195312
    16  3.021875  12.00  0.021875
    17  3.578125  12.75 -0.171875
    18  3.779688  13.50  0.029688
    27       NaN  20.25       NaN
    28       NaN  21.00       NaN
    29       NaN  21.75       NaN
    30       NaN  22.50       NaN
    31  6.746875    NaN  2.246875 # likely different due to missing data
    32       NaN    NaN       NaN