Search code examples
pythonpandaslistclosest

Find closest element of a pandas dataframe column in another column's list


I have the following dataframe:

A = [3,38,124]
B = [[0,0,1,7,34,76,4,15,28,8,7,8,200,108,7],[0,0,1,7,34], 
    [4,109,71,257,3,3,7,1,0,0,7,8,100,148,54,3,134,90,23,43,17]]

df = pd.DataFrame({'A':A,
   'B':B})
df 

The column B has lists as elements. I want to create a new column with the closest element to column A contained in the corresponding list of B.

Desired output :

A = [3,38,124]
B = [[0,0,1,7,34,76,4,15,28,8,7,8,200,108,7],[0,0,1,7,34], 
[4,109,71,257,3,3,7,1,0,0,7,8,100,148,54,3,134,90,23,43,17]]
Desired_output=[4,34,134]
df_out = pd.DataFrame({'A':A,
   'B':B,
              'Desired_output':Desired_output})
df_out=df_out [['A','B','Desired_output']]
df_out 

Solution

  • To complete the previous answer, if you want to do this after you put your data in a DataFrame, use the DataFrame.apply function as follows:

    import pandas as pd
    import numpy as np
    
    A = [3, 38, 124]
    B = [[0, 0, 1, 7, 34, 76, 4, 15, 28, 8, 7, 8, 200, 108, 7],
         [0, 0, 1, 7, 34],
         [4, 109, 71, 257, 3, 3, 7, 1, 0, 0, 7, 8, 100, 148, 54, 3, 134, 90, 23, 43, 17]]
    df = pd.DataFrame({'A': A, 'B': B})
    
    def find_nearest(row):
        return row["B"][np.argmin([abs(candidate-row["A"]) for candidate in row["B"]])]
    
    df["desired_output"] = df.apply(find_nearest, axis=1)
    
    print(df)