I'm trying to use the json_normalize function to convert a json file into a dataframe.
The json is a list of dictionaries that look something like this:
{
"sport_key": "basketball_ncaab",
"sport_nice": "NCAAB",
"teams": [
"Bryant Bulldogs",
"Wagner Seahawks"
],
"commence_time": 1608152400,
"home_team": "Bryant Bulldogs",
"sites": [
{
"site_key": "marathonbet",
"site_nice": "Marathon Bet",
"last_update": 1608156452,
"odds": {
"h2h": [
1.28,
3.54
]
}
},
{
"site_key": "sport888",
"site_nice": "888sport",
"last_update": 1608156452,
"odds": {
"h2h": [
1.13,
5.8
]
}
},
{
"site_key": "unibet",
"site_nice": "Unibet",
"last_update": 1608156434,
"odds": {
"h2h": [
1.13,
5.8
]
}
}
],
"sites_count": 3
}
The problem is that one of the future columns contains a list (which should be the case), but including this column in the meta part of the json_normalize function throws the following error:
ValueError: operands could not be broadcast together with shape (22,) (11,)
The error appears when I try to add "teams" in the list in the following code:
pd.json_normalize(data, 'sites', ['sport_key', 'sport_nice', 'home_team', 'teams'])
Assuming data
is a list of dictionaries, you can still use json_normalize
but you have to assign the teams
column seperately for each corresponding dictionary in data
:
def normalize(d):
return pd.json_normalize(d, 'sites', ['sport_key', 'sport_nice', 'home_team'])\
.assign(teams=[d['teams']]*len(d['sites']))
df = pd.concat([normalize(d) for d in data], ignore_index=True)
Alternatively you can try:
data = [{**d, 'teams': ','.join(d['teams'])} for d in data]
df = pd.json_normalize(data, 'sites', ['sport_key', 'sport_nice', 'home_team', 'teams'])
df['teams'] = df['teams'].str.split(',')
Result:
site_key site_nice last_update odds.h2h sport_key sport_nice home_team teams
0 marathonbet Marathon Bet 1608156452 [1.28, 3.54] basketball_ncaab NCAAB Bryant Bulldogs [Bryant Bulldogs, Wagner Seahawks]
1 sport888 888sport 1608156452 [1.13, 5.8] basketball_ncaab NCAAB Bryant Bulldogs [Bryant Bulldogs, Wagner Seahawks]
2 unibet Unibet 1608156434 [1.13, 5.8] basketball_ncaab NCAAB Bryant Bulldogs [Bryant Bulldogs, Wagner Seahawks]