I am trying to outer join (on df1) two pandas dataframe. Below are the sample data frames:
df1:
Index Team 1 Team 2 Team1_Score Team2_Score
0 A B 25 56
1 B C 30 55
2 D E 35 75
df2:
Index Team 1 Team 2 Team1_Avg Team2_Avg
0 A B 5 15
1 G F 10 25
2 C B 15 35
dfcombined
Index Team 1 Team 2 Team1_Score Team2_Score Team2_Avg Team1_Avg
0 A B 25 56 5 15
1 B C 30 55 35 15
2 D E 35 75
I was trying to use the pandasql module however I am not sure how to handle the case of joining index 1 in df1 and index 2 at df2 as the order of teams is reversed. Through pandasql module, I am not sure how to switch the Team Avg values in the combined data frame if the order of team is reverse.
I would appreciate any help on this.
Setup -
df1
Team 1 Team 2 Team1_Score Team2_Score
Index
0 A B 25 56
1 B C 30 55
2 D E 35 75
df2
Team 1 Team 2 Team1_Avg Team2_Avg
Index
0 A B 5 15
1 F G 25 10
2 B C 35 15
First, we'll need to sort the Team *
columns, and accordingly sort the Team*_Score
columns in the same way. We'll use argsort
to do so.
i = np.arange(len(df1))[:, None]
j = np.argsort(df1[['Team 1', 'Team 2']], axis=1).values
df1[['Team 1', 'Team 2']] = df1[['Team 1', 'Team 2']].values[i, j]
df1[['Team1_Score', 'Team2_Score']] = df1[['Team1_Score', 'Team2_Score']].values[i, j]
Now, repeat the same process for df2
, with Team *
and Team*_Avg
.
j = np.argsort(df2[['Team 1', 'Team 2']], axis=1).values
df2[['Team 1', 'Team 2']] = df2[['Team 1', 'Team 2']].values[i, j]
df2[['Team1_Avg', 'Team2_Avg']] = df2[['Team1_Avg', 'Team2_Avg']].values[i, j]
Now, perform a left outer merge
-
df1.merge(df2, on=['Team 1', 'Team 2'], how='left')
Team 1 Team 2 Team1_Score Team2_Score Team1_Avg Team2_Avg
0 A B 25 56 5 15
1 B C 30 55 35 15
2 D E 35 75