I have two datafames as such:
#Load the required libraries
import pandas as pd
import matplotlib.pyplot as plt
#Create dataset_1
data_set_1 = {'id': [1,
2,
3,
4,
5,
],
'Available_Salary': [10,
20,
30,
40,
50,
],
}
#Convert to dataframe_1
df_1 = pd.DataFrame(data_set_1)
print("\n df_1 = \n",df_1)
#Create dataset_2
data_set_2 = {'id': [1,
2,
3,
4,
],
'Expected_Salary': [16,
42,
28,
32,
],
}
#Convert to dataframe_2
df_2 = pd.DataFrame(data_set_2)
print("\n df_2 = \n",df_2)
Here, visually I can say, 'Expected_Salary' 16 (with id=1), is closer to 'Available_Salary' in the order as: 20 (with id=2), 10 (with id=1), 30 (with id=3), 40 (with id=4), 50 (with id=5)
Likewise, 'Expected_Salary' 42(with id=2), is closer to 'Available_Salary' in the order as: 40 (with id=4), 50 (with id=5), 30 (with id=3), 20 (with id=2), 10 (with id=1) and so on.
The same logic follows for other id's of df_2 to map with df_1.
I wish to see the result in a dataframe as such:
Can somebody please let me know how to achieve this task in Python?
Perform a cross-merge
, compute the difference between the expected and available salary and use this to sort_values
:
out = (df_2
.merge(df_1.rename(columns={'id': 'id_df1'}), how='cross')
.eval('diff=abs(Available_Salary-Expected_Salary)')
.sort_values(by=['id', 'diff'], ignore_index=True)
.drop(columns='diff')
)
Output:
id Expected_Salary id_df1 Available_Salary
0 1 16 2 20
1 1 16 1 10
2 1 16 3 30
3 1 16 4 40
4 1 16 5 50
5 2 42 4 40
6 2 42 5 50
7 2 42 3 30
8 2 42 2 20
9 2 42 1 10
10 3 28 3 30
11 3 28 2 20
12 3 28 4 40
13 3 28 1 10
14 3 28 5 50
15 4 32 3 30
16 4 32 4 40
17 4 32 2 20
18 4 32 5 50
19 4 32 1 10