I am reading data from JSON files that represent football games of tournaments: tournament_1.json, tournament_2.json, ... the files are of the form
{
"name": "tournament_7",
"start_date": "2022-05-01",
"end_date": "2022-06-21",
"games": {
"tour_1": [
{
"home": "FC Kiruk",
"away": "BAM united",
"date": "2022-05-05",
"result": "home_win",
"id": "tournament_7_1"
},
{
"home": "River Plays",
"away": "All Casabianca",
"date": "2022-05-06",
"result": "home_win",
"id": "tournament_7_2"
},
{
"home": "River Plays",
"away": "FC Kiruk",
"date": "2022-05-10",
"result": "away_win",
"id": "tournament_7_3"
},
{
"home": "BAM United",
"away": "All Casabianca",
"date": "2022-05-10",
"result": "draw",
"id": "tournament_7_4"
},
{
"home": "FC Kiruk",
"away": "All Casabianca",
"date": "2022-05-17",
"result": "draw",
"id": "tournament_7_5"
},
{
"home": "River Plays",
"away": "BAM United",
"date": "2022-05-18",
"result": "away_win",
"id": "tournament_7_6"
}
],
"tour_2": [
{
"home": "FC Kiruk",
"away": "BAM United",
"date": "2022-05-22",
"result": "draw",
"id": "tournament_7_7"
},
{
"home": "River Plays",
"away": "FC Kiruk",
"date": "2022-05-26",
"result": "home_win",
"id": "tournament_7_8"
},
{
"home": "BAM United",
"away": "River Plays",
"date": "2022-06-04",
"result": "home_win",
"id": "tournament_7_9"
}
],
"tour_3": [
{
"home": "FC Kiruk",
"away": "BAM United",
"date": "2022-06-12",
"result": "draw",
"id": "tournament_7_10"
},
{
"home": "BAM United",
"away": "FC Kiruk",
"date": "2022-06-20",
"result": "home_win",
"id": "tournament_7_11"
}
]
},
"tours": 3,
"type": "round_robin_all",
"winner": "BAM United"
}
What I did so far is read the JSON file using pandas.normalize_json with
import json
import pandas as pd
with open('tournament_7.json') as data_file:
data = json.load(data_file)
df = pd.json_normalize(data, record_path=['games', 'tour_1'], meta=['name','start_date', 'end_date', 'tours', 'type', 'winner'])
which of course allows me to read data about tour_1
I am asking if there is a way to read all of it at once (I need to aggregate all the data frames afterward: read all JSON files ~ 1000s)
You haven't specified exactly what is the output that you are looking for. Here is how I've done it:
import json
import pandas as pd
with open('tournament_7.json') as data_file:
data = json.load(data_file)
df = pd.DataFrame(columns=['name','start_date', 'end_date', 'tours', 'type', 'winner'])
for tour in data["games"]:
df = pd.concat([df,(pd.json_normalize(data, record_path=['games', tour], meta=['name','start_date', 'end_date', 'tours', 'type', 'winner']))], ignore_index = True, axis = 0)
print(df)
So I just loop through the different tours that are present in the games dictionary. Then I concat the resultant DataFrame. You may want to add a column that specifies which tour this row is for but that is up to you.