Search code examples
pythonjsonpandasnested-lists

Having trouble extracting values from a nested list and converting to dataframe


I'm trying to create a dataframe with 'team','games','wins','losses' and 'ties.

Here's a snippet of the data:

[{'away_games': {'games': 4, 'losses': 2, 'ties': 0, 'wins': 2},
  'conference': 'Mountain West',
  'conference_games': {'games': 8, 'losses': 3, 'ties': 0, 'wins': 5},
  'division': 'Mountain',
  'expected_wins': 9.9,
  'home_games': {'games': 7, 'losses': 1, 'ties': 0, 'wins': 6},
  'team': 'Air Force',
  'total': {'games': 13, 'losses': 3, 'ties': 0, 'wins': 10},
  'year': 2022},
 {'away_games': {'games': 8, 'losses': 6, 'ties': 0, 'wins': 1},
  'conference': 'Mid-American',
  'conference_games': {'games': 9, 'losses': 7, 'ties': 0, 'wins': 1},
  'division': 'East',
  'expected_wins': 1.5,
  'home_games': {'games': 5, 'losses': 4, 'ties': 0, 'wins': 1},
  'team': 'Akron',
  'total': {'games': 13, 'losses': 10, 'ties': 0, 'wins': 2},
  'year': 2022},

Here's the code I tried:

# Create an empty DataFrame
df = pd.DataFrame(columns=['team', 'games', 'wins', 'losses', 'ties'])

# Loop through each record in the data
for record in data:
    try:
        # Extract the desired values
        team = record['team']
        games = record['total'].get['games']
        wins = record['total'].get['wins']
        losses = record['total'].get['losses']
        ties = record['total'].get['ties']
        
        # Create a new row with the extracted values
        new_row = {'team': team, 'games': games, 'wins': wins, 'losses': losses, 'ties': ties}
        
        # Append the new row to the DataFrame
        df = df.append(new_row, ignore_index=True)
    
    except KeyError as e:
        print(f"Skipping record due to missing key: {e}")

# Print the resulting DataFrame
print(df)

Im getting an error that the 'TeamRecord' object is not subscriptable.

I'm sure there's a better / easier to way to do this. Any advice would be much appreciated.


Solution

  • That's how it's supposed to look:

    import pandas as pd
    
    data=[{'away_games': {'games': 4, 'losses': 2, 'ties': 0, 'wins': 2},
      'conference': 'Mountain West',
      'conference_games': {'games': 8, 'losses': 3, 'ties': 0, 'wins': 5},
      'division': 'Mountain',
      'expected_wins': 9.9,
      'home_games': {'games': 7, 'losses': 1, 'ties': 0, 'wins': 6},
      'team': 'Air Force',
      'total': {'games': 13, 'losses': 3, 'ties': 0, 'wins': 10},
      'year': 2022},
     {'away_games': {'games': 8, 'losses': 6, 'ties': 0, 'wins': 1},
      'conference': 'Mid-American',
      'conference_games': {'games': 9, 'losses': 7, 'ties': 0, 'wins': 1},
      'division': 'East',
      'expected_wins': 1.5,
      'home_games': {'games': 5, 'losses': 4, 'ties': 0, 'wins': 1},
      'team': 'Akron',
      'total': {'games': 13, 'losses': 10, 'ties': 0, 'wins': 2},
      'year': 2022}]
    
    rows = []
    # Loop through each record in the data
    for record in data:
        try:
            # Extract the desired values
            team = record['team']
            games = record['total']['games']
            wins = record['total']['wins']
            losses = record['total']['losses']
            ties = record['total']['ties']
    
            # Create a new row with the extracted values
            new_row = {'team': team, 'games': games, 'wins': wins, 'losses': losses, 'ties': ties}
            rows.append(new_row)
    
        except KeyError as e:
            print(f"Skipping record due to missing key: {e}")
    
    # Print the resulting DataFrame
    df = pd.DataFrame(rows, columns=['team', 'games', 'wins', 'losses', 'ties'])
    print(df)
    
        team    games   wins    losses  ties
    0   Air Force   13  10  3   0
    1   Akron   13  2   10  0
    

    It also looks like your data is borked since the total sum of wins, losses, and ties must result in the total number of games played. That's not the case for Akron.

    You don't use get, see also Create a Pandas Dataframe by appending one row at a time regarding append which has been deprecated and removed in Pandas>=2.0.0. Appending in a loop is in most cases a bad practice.