I have a Python script (Python 3.7) that accesses a Google Sheet and gets all of the data from the sheet using the get_all_records()
method from the gspread
library.
The response data is a list of dictionaries, with each row from the google sheet represented as it's own dictionary, and the key/value pairs as the column header and row values respectively like so:
[{'Away Team': 'Gillingham',
'Bet': 'Over 2.5 Goals',
'Home Team': 'AFC Wimbledon',
'Timestamp': '10/17/2019 10:36:01'},
{'Away Team': 'Liverpool',
'Bet': 'Home Win',
'Home Team': 'Man United',
'Timestamp': '10/18/2019 22:59:18'},
{'Away Team': 'Newcastle',
'Bet': 'BTTS',
'Home Team': 'Arsenal',
'Timestamp': '10/18/2019 22:59:31'},
{'Away Team': 'Man City',
'Bet': 'BTTS',
'Home Team': 'Everton',
'Timestamp': '10/20/2019 20:29:45'},
{'Away Team': 'Man City',
'Bet': 'BTTS',
'Home Team': 'Everton',
'Timestamp': '10/20/2019 20:29:52'},
{'Away Team': 'Man City',
'Bet': 'BTTS',
'Home Team': 'Everton',
'Timestamp': '10/20/2019 20:30:00'},
{'Away Team': 'Man City',
'Bet': 'BTTS',
'Home Team': 'Everton',
'Timestamp': '10/20/2019 20:30:02'},
{'Away Team': 'Newcastle',
'Bet': 'BTTS',
'Home Team': 'Arsenal',
'Timestamp': '10/18/2019 22:59:31'}]
The values of the 'Bet'
key can be 1 of 8 values. For each unique value, I want to count the frequency of the values in the 'Home Team'
key across all of the dictionaries.
In the example above, the most frequent 'Home Team'
value for key-value pair 'Bet': 'BTTS'
is Everton
I tried creating new dictionaries for each unique 'Bet'
key value using default dictionary
from the collections
module but I soon realized I could only create new dictionaries with either the 'Home Team'
values as keys with the 'Bet'
value as the value but I can't then capture frequency.
The data on the sheet is collected via a Google form so I can be assured of the integrity of the data captured as the form only allows values to be selected from predefined drop-downs or radio buttons.
Some advice or pointers in the right direction on modules/techniques to help me out here would be greatly appreciated.
I'm not familiar with the collections module, but it's really easy to achieve using old-school, plain Python.
Supposing we have your list of dictionaries stored in a variable called raw_dicts
. Allow me to suggest parsing it into a more convenient data structure for our task:
parsed_dict = dict()
for dictionary in raw_dicts:
bet = dictionary['Bet']
if bet not in parsed_dict.keys():
parsed_dict[bet] = dict()
if dictionary['Home Team'] not in parsed_dict[bet].keys():
parsed_dict[bet][dictionary['Home Team']] = 0
parsed_dict[bet][dictionary['Home Team']] += 1
What I did here is creating a counter for each bet-team pair. We get this beautiful dictionary:
{
"Over 2.5 Goals": {
"AFC Wimbledon": 1
},
"Home Win": {
"Man United": 1
},
"BTTS": {
"Everton": 4,
"Arsenal": 2
}
}
Now that we have such a nice dictionary, all we're left with is a simple maximization problem, whose solution is by the textbook:
most_frequent_bet = ""
most_frequent_team = ""
highest_frequency = 0
for bet in parsed_dict.keys():
for team in parsed_dict[bet].keys():
if parsed_dict[bet][team] > highest_frequency:
most_frequent_bet = bet
most_frequent_team = team
highest_frequency = parsed_dict[bet][team]
This could have been achieved by any number of other methods, some a lot more elegant and short than mine. What I wanted to do here is an easy-to-read code that goes step-by-step.