I am working with an API that returns the following format:
{
"count": 900,
"next": "api/?data&page=2",
"previous": null,
"results":
[{json object 1}, {json object 2}, {...}]
}
Problem is that I want to retrieve all "results" from all pages, and save that into one json file.
I'm thinking of a while loop that keeps making requests to the API and aggregating the resulting "results" into one variable, until the "next" value is null.
Something like
while json1["next"] != null:
r = request.get(apiURL, verify=False, allow_redirects=True, headers=headers, timeout=10)
raw_data = r.json()["results"]
final_data.update(raw_data)
I tried it but since r.json()["results"] is a list I don't know how to handle the different formats and transform that into a JSON file
When trying to do final_data.update(raw_data)
it gives me an error saying:
'list' object has no attribute 'update'
Or when trying json.loads(raw_data)
it gives me:
TypeError: the JSON object must be str, bytes, or bytearray, not list"
JSON file is a text file. To save your raw_data
, which is a list, in a text file, you need to encode it using json.dumps()
:
import json
with open('output.json', 'w', encoding="utf-8") as f:
raw_data_as_string = json.dumps(raw_data)
f.write(raw_data_as_string)
To aggregate the results from different pages, your final_data
can be a list, created before you iterate the pages, and then you can final_data.extend(raw_data)
in a loop, where raw_data
contains results from a single page.
After that you json.dumps(final_data)
as shown earlier.