python-3.x pandas dataframe group-by mapping

Map two dataframes, based on their group/id, in increasing order of the closer values in Python

I have two datafames as such:

#Load the required libraries
import pandas as pd
import matplotlib.pyplot as plt

#Create dataset_1
data_set_1 = {'id': [1,
                     2,
                     3, 
                     4,
                     5,
                     ],
              'Available_Salary': [10,
                                   20,
                                   30,
                                   40,
                                   50,
                                   ],
              }

#Convert to dataframe_1
df_1 = pd.DataFrame(data_set_1)
print("\n df_1 = \n",df_1)



#Create dataset_2
data_set_2 = {'id': [1, 
                     2, 
                     3, 
                     4,
                     ],
              'Expected_Salary': [16,
                                  42,
                                  28,
                                  32,
                                  ],
        }

#Convert to dataframe_2
df_2 = pd.DataFrame(data_set_2)
print("\n df_2 = \n",df_2)

Here, visually I can say, 'Expected_Salary' 16 (with id=1), is closer to 'Available_Salary' in the order as: 20 (with id=2), 10 (with id=1), 30 (with id=3), 40 (with id=4), 50 (with id=5)

Likewise, 'Expected_Salary' 42(with id=2), is closer to 'Available_Salary' in the order as: 40 (with id=4), 50 (with id=5), 30 (with id=3), 20 (with id=2), 10 (with id=1) and so on.

The same logic follows for other id's of df_2 to map with df_1.

I wish to see the result in a dataframe as such:

Can somebody please let me know how to achieve this task in Python?

Solution

Perform a cross-merge, compute the difference between the expected and available salary and use this to sort_values:

out = (df_2
   .merge(df_1.rename(columns={'id': 'id_df1'}), how='cross')
   .eval('diff=abs(Available_Salary-Expected_Salary)')
   .sort_values(by=['id', 'diff'], ignore_index=True)
   .drop(columns='diff')
)

Output:

    id  Expected_Salary  id_df1  Available_Salary
0    1               16       2                20
1    1               16       1                10
2    1               16       3                30
3    1               16       4                40
4    1               16       5                50
5    2               42       4                40
6    2               42       5                50
7    2               42       3                30
8    2               42       2                20
9    2               42       1                10
10   3               28       3                30
11   3               28       2                20
12   3               28       4                40
13   3               28       1                10
14   3               28       5                50
15   4               32       3                30
16   4               32       4                40
17   4               32       2                20
18   4               32       5                50
19   4               32       1                10