I am trying to access a json object which is stored as a zipped gz on an html website. I would like to do this directly with urllib if possible.
This is what I have tried:
import urllib
import json
#get the zip file
test = urllib.request.Request('http://files.tmdb.org/p/exports/movie_ids_01_27_2021.json.gz')
#unzip and read
with gzip.open(test, 'rt', encoding='UTF-8') as zipfile:
my_object = json.loads(zipfile)
but this fails with:
TypeError: filename must be a str or bytes object, or a file
Is it possible to read the json directly like this (e.g. I don't want to download locally).
Thank you.
Use requests library. pip install requests
if you don't have it.
Then use the following code:
import requests
r = requests.get('http://files.tmdb.org/p/exports/movie_ids_01_27_2021.json.gz')
print(r.content)
r.content
will be the binary content of the gzip file, but it will consume 11352985 bytes of memory (10.8 MB) because the data need to be kept somewhere.
then you can use
gzip.decompress(r.content)
to decompress the gzip binary and get the data. that will consume much bigger memory after decompression.