Search code examples
pythonpandasdataframemerge

Merge two dataframes iterating in within columns


I have two dataframes, one of players, with theirs clubs ids and rounds, and another with matches, withs scores and rounds.

Player| club_id | round  
a     |     16  |   1
b     |     13  |   1
c     |     12  |   1
a     |     16  |   2
...

-------

home_club_id| away_club_id |home_club_score| away_club_score| round  
16          |     13       |   1           |2               |1
15          |     1        |   4           |0               |1
12          |     2        |   1           |1               |1
12          |     16       |   2           |2               |2
...

I want to merge the two dataframes to see if a player played at home, and the score of the match.
The final data frame could be something like this:

Player|club_id|round|home|score|opponent_score
a     |16     |1    | yes|1    | 2
b     |13     |1    | no |2    | 1
a     |16     |2    | no |2    | 2
...

I tried to change home_club_id to club_id and merge with on =[round, club_id] but I did not found a way to merge home and away at the same time


Solution

  • To get your desired final frame, you could instead rearrange your data.

    To begin with, let's assume your frames are called player_frame and round_frame:

    from io import StringIO
    
    import pandas as pd
    
    player_data = StringIO('''Player club_id  round  
    a          16     1
    b          13     1
    c          12     1
    a          16     2''')
    player_frame = pd.read_csv(player_data, sep='\s+')
    
    round_data = StringIO('''home_club_id away_club_id home_club_score away_club_score round  
    16               13          1           2               1
    15               1           4           0               1
    12               2           1           1               1
    12               16          2           2               2''')
    round_frame = pd.read_csv(round_data, sep='\s+')
    

    We can then pull out columns to separately refer to the home and away data, renaming so that they match, and flagging whether the row is a home match or not.

    home_values = round_frame[['home_club_id', 'home_club_score', 'away_club_score', 'round']]\
                             .rename({'home_club_id': 'club_id', 
                                      'home_club_score': 'score', 
                                      'away_club_score': 'opponent_score'},
                                     axis=1)\
                             .assign(home='yes')
    
    away_values = round_frame[['away_club_id', 'away_club_score', 'home_club_score', 'round']]\
                             .rename({'away_club_id': 'club_id', 
                                      'home_club_score': 'opponent_score', 
                                      'away_club_score': 'score'},
                                     axis=1)\
                             .assign(home='no')
    

    Then we can concat the two and merge into player_frame:

    final_values = pd.concat([home_values, away_values], ignore_index=True).merge(player_frame)
    

    Which gives us:

       club_id  score  opponent_score  round home Player
    0       16      1               2      1  yes      a
    1       12      1               1      1  yes      c
    2       13      2               1      1   no      b
    3       16      2               2      2   no      a