Search code examples
pythonpandasdataframegroup-bymapping

Map two dataframes, based on their group/id, with closer values


I have two datafames as such:

#Load the required libraries
import pandas as pd
import matplotlib.pyplot as plt

#Create dataset_1
data_set_1 = {'id': [1,
                     2,
                     3, 
                     4,
                     5,
                     ],
              'Available_Salary': [10,
                                   20,
                                   30,
                                   40,
                                   50,
                                   ],
              }

#Convert to dataframe_1
df_1 = pd.DataFrame(data_set_1)
print("\n df_1 = \n",df_1)



#Create dataset_2
data_set_2 = {'id': [1, 
                     2, 
                     3, 
                     4,
                     5,
                     6,
                     7,
                     8,
                     9,
                     10,
                     ],
              'Expected_Salary': [9,
                                  49,
                                  18,
                                  19,
                                  29,
                                  41,
                                  4,
                                  57,
                                  42,
                                  3,
                                  ],
        }

#Convert to dataframe_2
df_2 = pd.DataFrame(data_set_2)
print("\n df_2 = \n",df_2)

Here, visually I can say, 'Expected_Salary' 9 (with id=1), 'Expected_Salary' 4 (with id=7) and 'Expected_Salary' 3 (with id=10) is closer to 'Available_Salary' 10 (with id=1).

Likewise, 'Expected_Salary' of 49 (with id=2) and 'Expected_Salary' 57 (with id=8) is closer to 'Available_Salary' 50 (with id=5), and so on.

This can be shown in below image file for better representation:

enter image description here

Now, I need to generate a new columns 'Salary_from_df_1' and 'id_from_df_1' in df_2 that will map with the id's of df_1 that signifies the closer salary.

For example, since the 'Expected_Salary' 9 (with id=1), 'Expected_Salary' 4 (with id=7) and 'Expected_Salary' 3 (with id=10) is closer to 'Available_Salary' 10 (with id=1), so they will have 'Salary_from_df_1' as 10 and 'id_from_df_1' as 1. This looks as such:

enter image description here

The same logic follows for other id's of df_2 to map with df_1.

Can somebody please let me know how to achieve this task in Python?


Solution

  • Calculate absolute difference of each salary in df_2 from every available salary in df_1, then use argmin to get the index of minimum absolute diff, then use this index to yank the id values from df_1 to df_2

    i = np.abs(np.subtract.outer(df_2['Expected_Salary'].values, df_1['Available Salary'].values)).argmin(axis=1)
    df_2['id_df1'] = df_1['id'].values[i]
    

       id  Expected_Salary  id_df1
    0   1                9       1
    1   2               49       5
    2   3               18       2
    3   4               19       2
    4   5               29       3
    5   6               41       4
    6   7                4       1
    7   8               57       5
    8   9               42       4
    9  10                3       1