Search code examples
pythonjsonpandasapidictionary-comprehension

How to parse a nested dictionary in pandas/python - baseball API


Hi I have the below API response which is a part very long JSON string. Im trying to parse the "away batting totals" so that i am able to pull the info into a data frame. Im using the function statsapi.boxscore_data(662647) to pull the full json response

awayBattingTotals': {'namefield': 'Totals',   'ab': '33',   'r': '2',   'h': '7',   'hr': '1',   'rbi': '2',   'bb': '0',   'k': '8',   'lob': '13',    From the full json response which is displayed below;

The full json string is super long.. below is a snippet. Im trying to use the code below to pull the info, but i've been unsuccessful

enter image description here

statsapi.boxscore_data(662647)

summary = statsapi.boxscore(662647)

result = summary[awayBattingfields]["Totals"]

print(result)

below is a snippet from the response;

  {'namefield': '9 Lopez, N  SS',
   'ab': '3',
   'r': '0',
   'h': '1',
   'doubles': '0',
   'triples': '0',
   'hr': '0',
   'rbi': '0',
   'sb': '0',
   'bb': '0',
   'k': '0',
   'lob': '2',
   'avg': '.248',
   'ops': '.599',
   'personId': 670032,
   'battingOrder': '900',
   'substitution': False,
   'note': '',
   'name': 'Lopez, N',
   'position': 'SS',
   'obp': '.305',
   'slg': '.294'}],
 'awayBattingTotals': {'namefield': 'Totals',
  'ab': '33',
  'r': '2',
  'h': '7',
  'hr': '1',
  'rbi': '2',
  'bb': '0',
  'k': '8',
  'lob': '13',
  'avg': '',
  'ops': '',
  'obp': '',
  'slg': '',
  'name': 'Totals',
  'position': '',
  'note': '',
  'substitution': False,
  'battingOrder': '',
  'personId': 0},
 'homeBattingTotals': {'namefield': 'Totals',
  'ab': '34',
  'r': '4',
  'h': '9',
  'hr': '2',
  'rbi': '4',
  'bb': '1',
  'k': '7',
  'lob': '13',
  'avg': '',
  'ops': '',
  'obp': '',
  'slg': '',
  'name': 'Totals',
  'position': '',
  'note': '',
  'substitution': False,
  'battingOrder': '',
  'personId': 0},
 'awayBattingNotes': {0: 'a-Struck out for Zavala in the 8th.'},
 'homeBattingNotes': {},
 'awayPitchers': [{'namefield': 'White Sox Pitchers',
   'ip': 'IP',
   'h': 'H',
   'r': 'R',
   'er': 'ER',
   'bb': 'BB',
   'k': 'K',
   'hr': 'HR',
   'era': 'ERA',
   'p': 'P',
   's': 'S',
   'name': 'White Sox Pitchers',
   'personId': 0,
   'note': ''},
  {'namefield': 'Lynn  (L, 2-5)',
   'ip': '6.0',
   'h': '7',
   'r': '4',
   'er': '4',
   'bb': '1',
   'k': '5',
   'hr': '2',
   'p': '90',
   's': '59',
   'era': '5.88',
   'name': 'Lynn',
   'personId': 458681,
   'note': '(L, 2-5)'},
  {'namefield': 'Kelly, J',
   'ip': '1.0',
   'h': '0',
   'r': '0',
   'er': '0',
   'bb': '0',
   'k': '1',
   'hr': '0',
   'p': '10',
   's': '7',
   'era': '5.18',
   'name': 'Kelly, J',
   'personId': 523260,
   'note': ''},
  {'namefield': 'Foster',
   'ip': '1.0',
   'h': '2',
   'r': '0',
   'er': '0',
   'bb': '0',
   'k': '1',
   'hr': '0',
   'p': '13',
   's': '10',
   'era': '4.40',
   'name': 'Foster',
   'personId': 641582,
   'note': ''}],
 'homePitchers': [{'namefield': 'Royals Pitchers',
   'ip': 'IP',
   'h': 'H',
   'r': 'R',
   'er': 'ER',
   'bb': 'BB',
   'k': 'K',
   'hr': 'HR',
   'era': 'ERA',
   'p': 'P',
   's': 'S',
   'name': 'White Sox Pitchers',
   'personId': 0,
   'note': ''},
  {'namefield': 'Singer  (W, 5-4)',
   'ip': '7.1',
   'h': '5',
   'r': '1',
   'er': '1',
   'bb': '0',
   'k': '6',
   'hr': '1',
   'p': '99',
   's': '71',
   'era': '3.49',
   'name': 'Singer',
   'personId': 663903,
   'note': '(W, 5-4)'},
  {'namefield': 'Barlow, S  (H, 5)',
   'ip': '0.2',
   'h': '0',
   'r': '0',
   'er': '0',
   'bb': '0',
   'k': '1',
   'hr': '0',
   'p': '11',
   's': '8',
   'era': '2.19',
   'name': 'Barlow, S',
   'personId': 605130,
   'note': '(H, 5)'},
  {'namefield': 'Coleman  (H, 10)',
   'ip': '0.1',
   'h': '2',
   'r': '1',
   'er': '1',
   'bb': '0',
   'k': '0',
   'hr': '0',
   'p': '13',
   's': '8',
   'era': '2.98',
   'name': 'Coleman',
   'personId': 669395,
   'note': '(H, 10)'},
  {'namefield': 'Cuas  (S, 1)',
   'ip': '0.2',
   'h': '0',
   'r': '0',
   'er': '0',
   'bb': '0',
   'k': '1',
   'hr': '0',
   'p': '9',
   's': '6',
   'era': '3.09',
   'name': 'Cuas',
   'personId': 621016,
   'note': '(S, 1)'}],
 'awayPitchingTotals': {'namefield': 'Totals',
  'ip': '8.0',
  'h': '9',
  'r': '4',
  'er': '4',
  'bb': '1',
  'k': '7',
  'hr': '2',
  'p': '',
  's': '',
  'era': '',
  'name': 'Totals',
  'personId': 0,
  'note': ''},
 'homePitchingTotals': {'namefield': 'Totals',
  'ip': '9.0',
  'h': '7',
  'r': '2',
  'er': '2',
  'bb': '0',
  'k': '8',
  'hr': '1',
  'p': '',
  's': '',
  'era': '',
  'name': 'Totals',
  'personId': 0,
  'note': ''},
 'gameBoxInfo': [{'label': 'HBP',
   'value': 'Harrison, J (by Singer); Garcia, Le (by Coleman).'},
  {'label': 'Pitches-strikes',
   'value': 'Lynn 90-59; Kelly, J 10-7; Foster 13-10; Singer 99-71; Barlow, S 11-8; Coleman 13-8; Cuas 9-6.'},
  {'label': 'Groundouts-flyouts',
   'value': 'Lynn 4-5; Kelly, J 1-1; Foster 0-2; Singer 6-5; Barlow, S 0-0; Coleman 0-0; Cuas 1-0.'},
  {'label': 'Batters faced',
   'value': 'Lynn 27; Kelly, J 3; Foster 5; Singer 28; Barlow, S 2; Coleman 4; Cuas 2.'},
  {'label': 'Inherited runners-scored', 'value': 'Barlow, S 2-0; Cuas 2-0.'},
  {'label': 'Umpires',
   'value': 'HP: Jerry Meals. 1B: Clint Vondrak. 2B: Malachi Moore. 3B: Vic Carapazza. '},
  {'label': 'Weather', 'value': '84 degrees, Partly Cloudy.'},
  {'label': 'Wind', 'value': '4 mph, L To R.'},
  {'label': 'First pitch', 'value': '3:10 PM.'},
  {'label': 'T', 'value': '2:36.'},
  {'label': 'Venue', 'value': 'Kauffman Stadium.'},
  {'label': 'August 9, 2022'}]}


Solution

  • What are you try to get as your output? It isn't clear at all what you are trying to do here.

    import statsapi
    
    summary = statsapi.boxscore_data(662647)
    result = summary["awayBattingTotals"]
    
    
    print(result)
    df = pd.DataFrame([result])
    print(df)
    

    Output:

     namefield  ab  r  h hr  ... position note substitution battingOrder personId
    0    Totals  33  2  7  1  ...                      False                     0
    
    [1 rows x 19 columns]
    

    Or the batters?

    result_batters = summary["awayBatters"]
    
    print(result_batters)
    df = pd.DataFrame(result_batters)
    print(df)
    

    Output:

                namefield  ab  r  h  ... position   obp   slg battingOrder
    0   White Sox Batters  AB  R  H  ...            OBP   SLG             
    1       1 Pollock  LF   4  0  1  ...       LF  .286  .353          100
    2        2 Robert  CF   4  0  1  ...       CF  .334  .453          200
    3       3 Jiménez  DH   4  0  1  ...       DH  .318  .455          300
    4      4 Abreu, J  1B   4  1  1  ...       1B  .378  .468          400
    5        5 Vaughn  RF   4  0  1  ...       RF  .348  .464          500
    6       6 Moncada  3B   3  0  0  ...       3B  .258  .311          600
    7    7 Garcia, Le  SS   3  0  0  ...       SS  .240  .279          700
    8   8 Harrison, J  2B   3  1  1  ...       2B  .312  .385          800
    9         9 Zavala  C   2  0  1  ...        C  .309  .395          900
    10       a-Sheets  PH   1  0  0  ...       PH  .285  .385          901
    11         Grandal  C   1  0  0  ...        C  .288  .242          902
    
    [12 rows x 22 columns]