Search code examples
pythonpandaslistdictionarypython-zip

Create a dictionary from Pandas column conditioned on values from another column


I have a tennis dataset that looks like the following:

tourney_id = ['French Open 2018','French Open 2018','Wimbledon 2018','Wimbledon 2018','Australian Open 2019','Australian Open 2019','US Open 2019','US Open 2019']
player_name = ['Novak Djokovic','Roger Federer','Andy Murray','Rafael Nadal','John Isner','Novak Djokovic','Andy Murray','Roger Federer']
match_num = [103, 103, 217, 217, 104, 104, 243, 243]

df = pd.DataFrame(list(zip(tourney_id, player_name, match_num)),
            columns =['TournamentID','Name','MatchID'])

I want to create a dictionary where the keys are the players and the items are also the players (opponents). So it would look like the following based on my dataset:

{'Novak Djokovic': ['Roger Federer','John Isner'],
 'Roger Federer': ['Novak Djokovic','Andy Murray'],
 'Andy Murray': ['Rafael Nadal','Roger Federer'],
 'Rafael Nadal': ['Andy Murray'],
 'John Isner': ['Novak Djokovic']}

I want to identify players who have played each other when they have the same values for both TournamentID and MatchID.

The last thing I tried was: df.set_index(['TournamentID','MatchID'])['Name'].to_dict() but that's not quite what I'm looking for.

What can I try next?


Solution

  • Here's how you can do it:

    import pandas as pd
    
    tourney_id = ['French Open 2018','French Open 2018','Wimbledon 2018','Wimbledon 2018','Australian Open 2019','Australian Open 2019','US Open 2019','US Open 2019']
    player_name = ['Novak Djokovic','Roger Federer','Andy Murray','Rafael Nadal','John Isner','Novak Djokovic','Andy Murray','Roger Federer']
    match_num = [103, 103, 217, 217, 104, 104, 243, 243]
    
    df = pd.DataFrame(list(zip(tourney_id, player_name, match_num)),
                    columns=['TournamentID', 'Name', 'MatchID'])
    
    # Create a DataFrame with pairs of players in each match
    df_pairs = pd.merge(df, df, how='inner', on=['TournamentID', 'MatchID'])
    df_pairs = df_pairs[df_pairs['Name_x'] != df_pairs['Name_y']]  # Remove rows where a player is paired with themselves
    
    # Group by each player and aggregate the opponents into a list
    opponents_dict = df_pairs.groupby('Name_x')['Name_y'].agg(list).to_dict()
    
    print(opponents_dict)
    

    Output:

    {
      "Andy Murray": [
        "Rafael Nadal",
        "Roger Federer"
      ],
      "John Isner": [
        "Novak Djokovic"
      ],
      "Novak Djokovic": [
        "Roger Federer",
        "John Isner"
      ],
      "Rafael Nadal": [
        "Andy Murray"
      ],
      "Roger Federer": [
        "Novak Djokovic",
        "Andy Murray"
      ]
    }