Search code examples
pythonpython-3.7

Google API Response Data: Count most frequent dictionary values from list of dictionaries


I have a Python script (Python 3.7) that accesses a Google Sheet and gets all of the data from the sheet using the get_all_records() method from the gspread library.

The response data is a list of dictionaries, with each row from the google sheet represented as it's own dictionary, and the key/value pairs as the column header and row values respectively like so:

[{'Away Team': 'Gillingham',
  'Bet': 'Over 2.5 Goals',
  'Home Team': 'AFC Wimbledon',
  'Timestamp': '10/17/2019 10:36:01'},
 {'Away Team': 'Liverpool',
  'Bet': 'Home Win',
  'Home Team': 'Man United',
  'Timestamp': '10/18/2019 22:59:18'},
 {'Away Team': 'Newcastle',
  'Bet': 'BTTS',
  'Home Team': 'Arsenal',
  'Timestamp': '10/18/2019 22:59:31'},
 {'Away Team': 'Man City',
  'Bet': 'BTTS',
  'Home Team': 'Everton',
  'Timestamp': '10/20/2019 20:29:45'},
 {'Away Team': 'Man City',
  'Bet': 'BTTS',
  'Home Team': 'Everton',
  'Timestamp': '10/20/2019 20:29:52'},
 {'Away Team': 'Man City',
  'Bet': 'BTTS',
  'Home Team': 'Everton',
  'Timestamp': '10/20/2019 20:30:00'},
 {'Away Team': 'Man City',
  'Bet': 'BTTS',
  'Home Team': 'Everton',
  'Timestamp': '10/20/2019 20:30:02'},
 {'Away Team': 'Newcastle',
  'Bet': 'BTTS',
  'Home Team': 'Arsenal',
  'Timestamp': '10/18/2019 22:59:31'}]

The values of the 'Bet' key can be 1 of 8 values. For each unique value, I want to count the frequency of the values in the 'Home Team' key across all of the dictionaries.

In the example above, the most frequent 'Home Team' value for key-value pair 'Bet': 'BTTS' is Everton

I tried creating new dictionaries for each unique 'Bet' key value using default dictionary from the collections module but I soon realized I could only create new dictionaries with either the 'Home Team' values as keys with the 'Bet' value as the value but I can't then capture frequency.

The data on the sheet is collected via a Google form so I can be assured of the integrity of the data captured as the form only allows values to be selected from predefined drop-downs or radio buttons.

Some advice or pointers in the right direction on modules/techniques to help me out here would be greatly appreciated.


Solution

  • I'm not familiar with the collections module, but it's really easy to achieve using old-school, plain Python.

    Supposing we have your list of dictionaries stored in a variable called raw_dicts. Allow me to suggest parsing it into a more convenient data structure for our task:

    parsed_dict = dict()
    for dictionary in raw_dicts:
        bet = dictionary['Bet']
        if bet not in parsed_dict.keys():
            parsed_dict[bet] = dict()
        if dictionary['Home Team'] not in parsed_dict[bet].keys():
            parsed_dict[bet][dictionary['Home Team']] = 0
        parsed_dict[bet][dictionary['Home Team']] += 1
    

    What I did here is creating a counter for each bet-team pair. We get this beautiful dictionary:

    {
        "Over 2.5 Goals": {
            "AFC Wimbledon": 1
        }, 
        "Home Win": {
            "Man United": 1
        }, 
        "BTTS": {
            "Everton": 4, 
            "Arsenal": 2
        }
    }
    

    Now that we have such a nice dictionary, all we're left with is a simple maximization problem, whose solution is by the textbook:

    most_frequent_bet = ""
    most_frequent_team = ""
    highest_frequency = 0
    for bet in parsed_dict.keys():
        for team in parsed_dict[bet].keys():
            if parsed_dict[bet][team] > highest_frequency:
                most_frequent_bet = bet
                most_frequent_team = team
                highest_frequency = parsed_dict[bet][team]
    

    This could have been achieved by any number of other methods, some a lot more elegant and short than mine. What I wanted to do here is an easy-to-read code that goes step-by-step.