I have two datafames as such:
#Load the required libraries
import pandas as pd
import matplotlib.pyplot as plt
#Create dataset_1
data_set_1 = {'id': [1,
2,
3,
4,
5,
],
'Available_Salary': [10,
20,
30,
40,
50,
],
}
#Convert to dataframe_1
df_1 = pd.DataFrame(data_set_1)
print("\n df_1 = \n",df_1)
#Create dataset_2
data_set_2 = {'id': [1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
],
'Expected_Salary': [9,
49,
18,
19,
29,
41,
4,
57,
42,
3,
],
}
#Convert to dataframe_2
df_2 = pd.DataFrame(data_set_2)
print("\n df_2 = \n",df_2)
Here, visually I can say, 'Expected_Salary' 9 (with id=1), 'Expected_Salary' 4 (with id=7) and 'Expected_Salary' 3 (with id=10) is closer to 'Available_Salary' 10 (with id=1).
Likewise, 'Expected_Salary' of 49 (with id=2) and 'Expected_Salary' 57 (with id=8) is closer to 'Available_Salary' 50 (with id=5), and so on.
This can be shown in below image file for better representation:
Now, I need to generate a new columns 'Salary_from_df_1' and 'id_from_df_1' in df_2 that will map with the id's of df_1 that signifies the closer salary.
For example, since the 'Expected_Salary' 9 (with id=1), 'Expected_Salary' 4 (with id=7) and 'Expected_Salary' 3 (with id=10) is closer to 'Available_Salary' 10 (with id=1), so they will have 'Salary_from_df_1' as 10 and 'id_from_df_1' as 1. This looks as such:
The same logic follows for other id's of df_2 to map with df_1.
Can somebody please let me know how to achieve this task in Python?
Calculate absolute difference of each salary in df_2
from every available salary in df_1
, then use argmin
to get the index of minimum absolute diff, then use this index to yank the id values from df_1
to df_2
i = np.abs(np.subtract.outer(df_2['Expected_Salary'].values, df_1['Available Salary'].values)).argmin(axis=1)
df_2['id_df1'] = df_1['id'].values[i]
id Expected_Salary id_df1
0 1 9 1
1 2 49 5
2 3 18 2
3 4 19 2
4 5 29 3
5 6 41 4
6 7 4 1
7 8 57 5
8 9 42 4
9 10 3 1