I would like to get a cross-table by using pandas dataframe with Python like this:
Home | Away | Tot | |
---|---|---|---|
played | 19 | 19 | 38 |
wins | 10 | 8 | 18 |
draws | 7 | 5 | 12 |
loss | 2 | 6 | 8 |
goals_for | 40 | 26 | 66 |
goals_against | 17 | 19 | 36 |
avg_goal_for | 2.1 | 1.4 | 1.7 |
avg_goal_against | 0.9 | 1.0 | 0.9 |
and the json file is :
{
"get": "teams/statistics",
"response": {
"fixtures": {
"played": {
"home": 19,
"away": 19,
"total": 38
},
"wins": {
"home": 10,
"away": 8,
"total": 18
},
"draws": {
"home": 7,
"away": 5,
"total": 12
},
"loses": {
"home": 2,
"away": 6,
"total": 8
}
},
"goals": {
"for": {
"total": {
"home": 40,
"away": 26,
"total": 66
},
"average": {
"home": "2.1",
"away": "1.4",
"total": "1.7"
}
},
"against": {
"total": {
"home": 17,
"away": 19,
"total": 36
},
"average": {
"home": "0.9",
"away": "1.0",
"total": "0.9"
}
}
}
}
}
I don't know how to handle all nested dictionaries in an easy way. Please someone can help me out. Thanks
One way to do this is to process both halves of your response
separately, using json_normalize
to read the data, then splitting the column names to a multi-level index and then using stack
to move to a long format from the wide format. You can then drop the top-level index and concat
the two dataframes:
data = json.loads(jstr)
fixtures = pd.json_normalize(data['response']['fixtures'])
fixtures.columns = fixtures.columns.str.split('.', expand=True)
fixtures = fixtures.stack(level=0)
fixtures.index = fixtures.index.droplevel()
fixtures
# away home total
# draws 5 7 12
# loses 6 2 8
# played 19 19 38
# wins 8 10 18
goals = pd.json_normalize(data['response']['goals'])
goals.columns = goals.columns.str.split(r'\.(?!.*\.)', expand=True)
goals = goals.stack(level=0)
goals.index = goals.index.droplevel()
goals
# away home total
# against.average 1.0 0.9 0.9
# against.total 19 17 36
# for.average 1.4 2.1 1.7
# for.total 26 40 66
out = pd.concat([fixtures, goals])
Output:
away home total
draws 5 7 12
loses 6 2 8
played 19 19 38
wins 8 10 18
against.average 1.0 0.9 0.9
against.total 19 17 36
for.average 1.4 2.1 1.7
for.total 26 40 66
Note, if desired you can rename the index column to match your question using
out.index = ['draws', 'loses', 'played', 'wins', 'goals_against_average', 'goals_against_total', 'goals_for_average', 'goals_for_total']
Output:
away home total
draws 5 7 12
loses 6 2 8
played 19 19 38
wins 8 10 18
goals_against_average 1.0 0.9 0.9
goals_against_total 19 17 36
goals_for_average 1.4 2.1 1.7
goals_for_total 26 40 66
Note this code can be made more generic, by creating a dict of the sub-dictionary names and the appropriate splitting string, and then iterating that dict:
parts = { 'fixtures' : '.', 'goals' : r'\.(?!.*\.)' }
dfs = []
for part in parts:
df = pd.json_normalize(data['response'][part])
df.columns = df.columns.str.split(parts[part], expand=True)
df = df.stack(level=0)
df.index = df.index.droplevel()
dfs.append(df)
out = pd.concat(dfs)