Search code examples
pythondataframematchapproximate

python match value within tolerance


I'm trying to match values from a column in a dataframe using values from a different column within the tolerance value. I have 2 dataframes:

                               Dp  y_escape_ave(m)
0    [Series 1 at injection 12 1]        -0.015850
1    [Series 2 at injection 03 1]        -0.037345
2    [Series 1 at injection 06 1]        -0.037497
3    [Series 4 at injection 18 1]        -0.012622
4    [Series 5 at injection 21 1]              NaN
5    [Series 6 at injection 24 1]        -0.008801
6    [Series 7 at injection 27 1]        -0.008711

        v(m/s)      y(m)
0     0.000001 -0.007100
1     0.000001 -0.007131
2     0.000001 -0.007161
3     0.000001 -0.007192
4    60.012138 -0.007223
..         ...       ...
917  26.700808 -0.037577
918  26.764549 -0.037608
919  26.833567 -0.037639
920  26.889654 -0.037669
921  26.371773 -0.037700

I'm trying to match the y_escape_ave values from the first dataframe approximately (within some tolerance - y_tol) to the values y(m) column of the second dataframe and then add the corresponding value from the v(m/s) column to the y_escape_ave(m) value. My thinking was to do something similar to Excels INDEX(MATCH;;-1) method but I cannot get it to work.

My code so far is:

vel_escape = []
vel_escape_temp = [[] for j in range(0,len(df_results.index)-1)]

for i in range(0, len(df_results.index)-1):
    for ii in range(0, len(df_vel_filt.index)-1):
        if df_results["y_escape_ave(m)"][i] == "":
            continue
        else:
            if abs(abs(df_results["y_escape_ave(m)"][i]) - abs(df_vel_filt["y(m)"][ii])) < y_tol:
                vel_escape_temp[i].append(df_vel_filt["v(m/s)"][ii])
    if len(vel_escape_temp[i]) <= 1:
        vel_escape.append(vel_escape_temp[i][0])
    else:
        vel_escape.append(statistics.mean(vel_escape_temp[i]))

Is there perhaps an easier way?


Solution

  • You can try pandas.merge_asof

    y_tol = None
    
    df1['v(m/s)'] = pd.merge_asof(df1.sort_values('y_escape_ave(m)').fillna(0), df2.sort_values('y(m)'),
                                  left_on='y_escape_ave(m)', right_on='y(m)', tolerance=y_tol)['v(m/s)']
    
    print(df1)
    
                                 Dp  y_escape_ave(m)     v(m/s)
    0  [Series 1 at injection 12 1]        -0.015850  26.700808
    1  [Series 2 at injection 03 1]        -0.037345  26.700808
    2  [Series 1 at injection 06 1]        -0.037497  26.700808
    3  [Series 4 at injection 18 1]        -0.012622  26.700808
    4  [Series 5 at injection 21 1]              NaN  26.700808
    5  [Series 6 at injection 24 1]        -0.008801  26.700808
    6  [Series 7 at injection 27 1]        -0.008711   0.000001